In the realm of indoor security surveillance, especially in unique settings such as tendering rooms, object detection faces multiple challenges including the detection of small items, which are heavily affected by lighting and occlusions. This increases the risk of missed and false detections. This paper presents an enhanced YOLOv8-based model, YOLOv8-ASFB, aimed at boosting detection accuracy in such environments. The key contributions of this study are:
(1) Optimization for Small Object Detection: Acknowledging the sparsity of feature information in small objects within surveillance environments, this study employs SPD convolutional methods to retain more target information within the channel dimensions without increasing the size of the feature maps. Furthermore, subsequent modules have been enhanced to extract and process these detailed features more effectively, significantly improving the detection accuracy of small objects.
(2) Addressing complex environments: To mitigate issues of uneven lighting and object occlusion, this study introduces the ASFB module, integrating SPDConv with the attention mechanism. This enhances feature capture in complex environments. Furthermore, the study combines the sliding loss function with the traditional binary cross-entropy loss function, adopting Slide-BCE Loss as the classification loss function. This integration further reduces missed detections, enhancing the model's overall detection performance.
(3) Comprehensive Improvement in Model Performance: Integrating the aforementioned techniques, the proposed YOLOv8-ASFB model has significantly improved object detection performance. Through a series of ablation studies and comparative experiments on attention mechanisms, this research not only validates the effectiveness of various improvements but also confirms the YOLOv8-ASFB model's exceptional performance in reducing miss detections. Experimental results demonstrate that the model markedly enhances the accuracy of object identification against complex backgrounds and effectively addresses numerous challenges within the tendering room surveillance environment.
1.INDAH KUSUMARUKMI E, JOKO WAHYU ADI T. Public tendering process for construction projects: problem identifications, analysis, and proposed solutions[J/OL]. MATEC Web of Conferences, 2019: 02013.
2.CHANG I S, CHA H H, PARK G M, et al. A Study of Scenario and Trends in Intelligent Surveillance Camera[J]. The Journal of The Korea Institute of Intelligent Transport Systems, 2009.
3.HE K, ZHANG X, REN S, et al. Deep Residual Learning for Image Recognition[C/OL]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016.
4.SZEGEDY C, WEI LIU, YANGQING JIA, et al. Going Deeper with Convolutions[C/OL]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015.
5.REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017: 1137-1149.
6.Sayeduzzaman M, Hasan T, Nasser A A, et al. An Internet of Things‐Integrated Home Automation with Smart Security System[J]. Automated Secure Computing for Next‐Generation Systems, 2024: 243-273.
7.HUANG K, TAN T. Vs-star: A visual interpretation system for visual surveillance[J/OL]. Pattern Recognition Letters, 2010, 31(14): 2265-2285.
8.王彥.深度學習技術在物聯網智能安防領域的應用[J].信息與電腦(理論版),2023,35(19):1-3.
9.CHEVALIER M, THOME N, CORD M, et al. LR-CNN for fine-grained classification with varying resolution[C/OL]//2015 IEEE International Conference on Image Processing (ICIP). 2015.
10.PENG X, HOFFMAN J, YU S X, et al. Fine-to-coarse knowledge transfer for low-res image classification[C/OL]//2016 IEEE International Conference on Image Processing (ICIP). 2016.
11.WANG Z, CHANG S, YANG Y, et al. Studying Very Low Resolution Recognition Using Deep Networks[C/OL]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016.
[12.Sunkara R, Luo T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer Nature Switzerland, 2022: 443-459.
13.NOH J, BAE W, LEE W, et al. Better to Follow, Follow to Be Better: Towards Precise Supervision of Feature Super-Resolution for Small Object Detection[C/OL]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). 2019.
14.DENG C, WANG M, LIU L, et al. Extended Feature Pyramid Network for Small Object Detection[J/OL]. IEEE Transactions on Multimedia, 2022: 1968-1979.
15.YANG Z, LIU S, HU H, et al. RepPoints: Point Set Representation for Object Detection[C/OL]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). 2019.
16.WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional Block Attention Module[M/OL]//Computer Vision – ECCV 2018,Lecture Notes in Computer Science. 2018: 3-19.
17.TANG R, LEI Y, LUO B, et al. YOLOv7-Plum: Advancing Plum Fruit Detection in Natural Environments with Deep Learning[J/OL]. Plants, 2023, 12(15): 2883.
18.PATO L V, NEGRINHO R, AGUIAR P M Q. Seeing without Looking: Contextual Rescoring of Object Detections for AP Maximization[C/OL]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020.
19.ZOU Z, CHEN K, SHI Z, et al. Object Detection in 20 Years: A Survey[J/OL]. Proceedings of the IEEE, 2023: 257-276.
20.LIU L, OUYANG W, WANG X, et al. Deep Learning for Generic Object Detection: A Survey[J/OL]. International Journal of Computer Vision, 2020: 261-318.
21.REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection[C/OL]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016.
22.HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020: 386-397.
23.LECUN Y, BENGIO Y, HINTON G. Deep learning[J/OL]. Nature, 2015: 436-444.
24.HUBEL D H, WIESEL T N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex[J/OL]. The Journal of Physiology, 1962: 106-154.
25.SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[J]. International Conference on Learning Representations, 2015.
26.GOODFELLOW I, AAR Y. Deep Learning by Ian Goodfellow, Yoshua Bengio & Aar Computers[J]. 2016.
27.SHAFIQ M, GU Z. Deep Residual Learning for Image Recognition: A Survey[J/OL]. Applied Sciences, 2022: 8972.
28.SCHERER D, MÜLLER A, BEHNKE S. Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition[M/OL]//Artificial Neural Networks – ICANN 2010,Lecture Notes in Computer Science. 2010: 92-101.
29.BOUREAU Y L, PONCE J, LECUN Y. A Theoretical Analysis of Feature Pooling in Visual Recognition[J]. International Conference on Machine Learning, 2010.
30.GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[J]. International Conference on Artificial Intelligence and Statistics, 2010.
31.MONTAVON G, ORR G, MLLER K R. Neural Networks: Tricks of the Trade[M/OL]//Lecture Notes in Computer Science. 2012.
32.KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J/OL]. Communications of the ACM, 2017: 84-90.
33.DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C/OL]//2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009.
34.LIN M, CHEN Q, YAN S. Network In Network[J]. arXiv: Neural and Evolutionary Computing, 2013.
35.GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C/OL]//2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014.
36.ZHANG K, ZHANG Z, LI Z, et al. Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks[J/OL]. IEEE Signal Processing Letters, 2016: 1499-1503.
37.VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features[C/OL]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. 2005.
38.UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective Search for Object Recognition[J/OL]. International Journal of Computer Vision, 2013: 154-171.
39.VAN DE SANDE K E A, UIJLINGS J R R, GEVERS T, et al. Segmentation as selective search for object recognition[C/OL]//2011 International Conference on Computer Vision. 2011.
40.DALALN,TRIGGSB.HistogramsofOrientedGradientsforHumanDetection[C/OL]//2005IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition(CVPR’05).2005.
41.GIRSHICKR.FastR-CNN[C/OL]//2015IEEEInternationalConferenceonComputerVision(ICCV).2015.
42.REDMONJ,FARHADIA.YOLO9000:Better,Faster,Stronger[C/OL]//2017IEEEConferenceonComputerVisionandPatternRecognition(CVPR).2017.
43.REDMONJ,FARHADIA.YOLOv3:AnIncrementalImprovement.[J].arXiv:ComputerVisionandPatternRecognition,2018.
44.BOCHKOVSKIYA,WANGCY,LIAOHY.YOLOv4:OptimalSpeedandAccuracyofObjectDetection[J].CornellUniversity-arXiv,2020.
45.LIC,LIL,JIANGH,etal.YOLOv6:ASingle-StageObjectDetectionFrameworkforIndustrialApplications[J].2022.
46.WANGCY,BOCHKOVSKIYA,LIAOHY.YOLOv7:Trainable bag-of-freebiessetsnewstate-of-the-artforreal-timeobjectdetectors[J].
47.BAHDANAUD,CHOK,BENGIO Y.NeuralMachineTranslation by Jointly Learning to Align and Translate[J]. Cornell University - arXiv, 2014.
48.HU J, SHEN L, ALBANIE S, et al. Squeeze-and-Excitation Networks[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020: 2011-2023.
49.HOU Q, ZHOU D, FENG J. Coordinate Attention for Efficient Mobile Network Design[C/OL]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021.
50.RAMACHANDRAN P, ZOPH B, LE QuocV. Searching for Activation Functions[J]. arXiv: Neural and Evolutionary Computing, 2017.
51.YU Z, HUANG H, CHEN W, et al. YOLO-FaceV2: A Scale and Occlusion Aware Face Detector[J]. Pattern Recognition, 2024, 155: 110714.
52.Wan D, Lu R, Shen S, et al. Mixed local channel attention for object detection[J]. Engineering Applications of Artificial Intelligence, 2023, 123: 106442.