本校學位論文庫
CITYU Theses & Dissertations
論文詳情
李济瀚
楊雲
數據科學學院
數據科學碩士學位課程(中文學制)
碩士
2024
基於YOLOv8的評標室場景智能檢測研究
Research on YOLOv8-Based Intelligent Detection for Tendering Room Environments
室內監控圖像 ; 目標檢測 ; 注意力機制 ; 損失函數
Indoor Surveillance Imaging ; Object Detection ; Attention Mechanisms ; Loss Functions
公開日期:2/8/2027
在室內安全監控領域,特別是在評標室環境下,目標檢測任務面臨許多挑戰。這些挑戰主要包括小型目標的檢測、環境中的光照和遮擋、以及部分類別的檢測困難,這些因素極大地增加了漏檢和錯檢的風險。針對這些問題,本研究提出了一個基於YOLOv8的改進型模型YOLOv8-ASFB,該模型旨在提升特定監控環境下的目標檢測效果。本研究的主要貢獻涵蓋以下幾個方面:
(1) 小目標檢測的優化:鑒於監控環境中小目標特徵信息的稀疏性,本研究採用空間深度轉換卷積SPDConv,在不增加特徵圖尺寸大小的情況下,將更多的目標信息保留在通道維度。此外,本研究在後續模塊中強化了對這些細節特徵的提取和處理,顯著提高了小型目標的檢測率。
(2) 複雜環境的應對:為解決光照影響和目標遮擋問題,本研究提出了融合注意力增強的空間卷積模塊ASFB,這一模塊將SPDConv的特點與注意力機制相結合。增強了模型對於複雜環境特徵的捕捉能力。同時,將滑動損失函數與傳統的二進制交叉熵損失函數相結合,改進滑動交叉熵損失函數Slide-BCE Loss作為分類損失函數,進一步減少了漏檢現象,提升了模型的整體檢測性能。
(3) 模型性能的綜合提升:綜合應用以上技術,本研究提出的YOLOv8-ASFB模型通過多項改進顯著提高了目標檢測的性能。通過一系列的消融實驗和注意力機制的對比實驗,本研究不僅驗證了不同改進技術的效果,而且確認了YOLOv8-ASFB在解決目標漏檢問題上的出色表現。實驗結果表明,該模型顯著提高了在複雜背景中目標的辨識準確性,並有效處理了評標室監控環境中的多種挑戰。
In the realm of indoor security surveillance, especially in unique settings such as tendering rooms, object detection faces multiple challenges including the detection of small items, which are heavily affected by lighting and occlusions. This increases the risk of missed and false detections. This paper presents an enhanced YOLOv8-based model, YOLOv8-ASFB, aimed at boosting detection accuracy in such environments. The key contributions of this study are:
(1) Optimization for Small Object Detection: Acknowledging the sparsity of feature information in small objects within surveillance environments, this study employs SPD convolutional methods to retain more target information within the channel dimensions without increasing the size of the feature maps. Furthermore, subsequent modules have been enhanced to extract and process these detailed features more effectively, significantly improving the detection accuracy of small objects.
(2) Addressing complex environments: To mitigate issues of uneven lighting and object occlusion, this study introduces the ASFB module, integrating SPDConv with the attention mechanism. This enhances feature capture in complex environments. Furthermore, the study combines the sliding loss function with the traditional binary cross-entropy loss function, adopting Slide-BCE Loss as the classification loss function. This integration further reduces missed detections, enhancing the model's overall detection performance.
(3) Comprehensive Improvement in Model Performance: Integrating the aforementioned techniques, the proposed YOLOv8-ASFB model has significantly improved object detection performance. Through a series of ablation studies and comparative experiments on attention mechanisms, this research not only validates the effectiveness of various improvements but also confirms the YOLOv8-ASFB model's exceptional performance in reducing miss detections. Experimental results demonstrate that the model markedly enhances the accuracy of object identification against complex backgrounds and effectively addresses numerous challenges within the tendering room surveillance environment.
2024
中文
63
致 謝 III
摘 要IV
Abstract V
目 錄VI
圖目錄IX
表目錄X
第一章 緒論 1
1.1 研究背景及意義 1
1.2 國內外研究現狀 2
1.2.1 智能監控研究現狀 2
1.2.2 小目標檢測研究現狀 2
1.3 研究內容 4
1.4 研究難點 5
1.5 本文主要貢獻 6
1.6 文章結構安排 6
1.7 本章小結 7
第二章 相關算法發展及理論基礎 8
2.1 引言 8
2.2 卷積神經網絡 8
2.2.1 卷積層 9
2.2.2 池化層 10
2.2.3 激活層 11
2.2.4 全連接層 12
2.3 卷積神經網絡模型 13
2.4 傳統目標檢測方法 13
2.4.1 基於區域提取的目標檢測方法 14
2.5 基於 YOLO 的目標檢測方法 16
2.5.1 YOLO 系列模型的發展 17
2.5.2 YOLOv8 的網絡結構 19
2.6 注意力機制 22
2.6.1 擠壓和激勵機制 23
2.6.2 卷積塊的注意力模塊 24
2.6.3 坐標注意力機制 26
2.7 本章小結 27
第三章 網絡結構與改進策略28
3.1 引言 28
3.2 YOLO 算法在當前場景下的不足 28
3.2.1 光照環境的變化 28
3.2.2 遮擋目標的檢測 29
3.2.3 小目標檢測 29
3.2.4 樣本分類不均衡 30
3.3 網絡結構改進 30
3.4 卷積結構改進 32
3.5 注意力機制的引入 33
3.5.1 混合局部通道注意力33
3.5.2 融合注意力增強的空間卷積塊 36
3.6 損失函數的改進 40
3.6.1 二元交叉熵損失函數 41
3.6.2 滑動交叉熵損失函數 42
3.7 本章小結 44
第四章 實驗與分析 45
4.1 數據集描述 45
4.2 評價指標 46
4.3 實驗環境 48
4.3.1 硬件與軟件環境 48
4.3.2 模型配置 48
4.4 實驗結果與分析 49
4.4.1 基準模型訓練 49
4.4.2 模型對比實驗 51
4.4.3 注意力機制對比實驗 51
4.4.4 消融實驗 52
4.4.5 測試結果分析 54
4.5 本章小結 57
第五章 總結與展望 58
5.1 總結 58
5.2 展望 58
參考文獻 60
作者簡歷 63
1.INDAH KUSUMARUKMI E, JOKO WAHYU ADI T. Public tendering process for construction projects: problem identifications, analysis, and proposed solutions[J/OL]. MATEC Web of Conferences, 2019: 02013.
2.CHANG I S, CHA H H, PARK G M, et al. A Study of Scenario and Trends in Intelligent Surveillance Camera[J]. The Journal of The Korea Institute of Intelligent Transport Systems, 2009.
3.HE K, ZHANG X, REN S, et al. Deep Residual Learning for Image Recognition[C/OL]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016.
4.SZEGEDY C, WEI LIU, YANGQING JIA, et al. Going Deeper with Convolutions[C/OL]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015.
5.REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017: 1137-1149.
6.Sayeduzzaman M, Hasan T, Nasser A A, et al. An Internet of Things‐Integrated Home Automation with Smart Security System[J]. Automated Secure Computing for Next‐Generation Systems, 2024: 243-273.
7.HUANG K, TAN T. Vs-star: A visual interpretation system for visual surveillance[J/OL]. Pattern Recognition Letters, 2010, 31(14): 2265-2285.
8.王彥.深度學習技術在物聯網智能安防領域的應用[J].信息與電腦(理論版),2023,35(19):1-3.
9.CHEVALIER M, THOME N, CORD M, et al. LR-CNN for fine-grained classification with varying resolution[C/OL]//2015 IEEE International Conference on Image Processing (ICIP). 2015.
10.PENG X, HOFFMAN J, YU S X, et al. Fine-to-coarse knowledge transfer for low-res image classification[C/OL]//2016 IEEE International Conference on Image Processing (ICIP). 2016.
11.WANG Z, CHANG S, YANG Y, et al. Studying Very Low Resolution Recognition Using Deep Networks[C/OL]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016.
[12.Sunkara R, Luo T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer Nature Switzerland, 2022: 443-459.
13.NOH J, BAE W, LEE W, et al. Better to Follow, Follow to Be Better: Towards Precise Supervision of Feature Super-Resolution for Small Object Detection[C/OL]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). 2019.
14.DENG C, WANG M, LIU L, et al. Extended Feature Pyramid Network for Small Object Detection[J/OL]. IEEE Transactions on Multimedia, 2022: 1968-1979.
15.YANG Z, LIU S, HU H, et al. RepPoints: Point Set Representation for Object Detection[C/OL]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). 2019.
16.WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional Block Attention Module[M/OL]//Computer Vision – ECCV 2018,Lecture Notes in Computer Science. 2018: 3-19.
17.TANG R, LEI Y, LUO B, et al. YOLOv7-Plum: Advancing Plum Fruit Detection in Natural Environments with Deep Learning[J/OL]. Plants, 2023, 12(15): 2883.
18.PATO L V, NEGRINHO R, AGUIAR P M Q. Seeing without Looking: Contextual Rescoring of Object Detections for AP Maximization[C/OL]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020.
19.ZOU Z, CHEN K, SHI Z, et al. Object Detection in 20 Years: A Survey[J/OL]. Proceedings of the IEEE, 2023: 257-276.
20.LIU L, OUYANG W, WANG X, et al. Deep Learning for Generic Object Detection: A Survey[J/OL]. International Journal of Computer Vision, 2020: 261-318.
21.REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, Real-Time Object Detection[C/OL]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016.
22.HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020: 386-397.
23.LECUN Y, BENGIO Y, HINTON G. Deep learning[J/OL]. Nature, 2015: 436-444.
24.HUBEL D H, WIESEL T N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex[J/OL]. The Journal of Physiology, 1962: 106-154.
25.SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[J]. International Conference on Learning Representations, 2015.
26.GOODFELLOW I, AAR Y. Deep Learning by Ian Goodfellow, Yoshua Bengio & Aar Computers[J]. 2016.
27.SHAFIQ M, GU Z. Deep Residual Learning for Image Recognition: A Survey[J/OL]. Applied Sciences, 2022: 8972.
28.SCHERER D, MÜLLER A, BEHNKE S. Evaluation of Pooling Operations in Convolutional Architectures for Object Recognition[M/OL]//Artificial Neural Networks – ICANN 2010,Lecture Notes in Computer Science. 2010: 92-101.
29.BOUREAU Y L, PONCE J, LECUN Y. A Theoretical Analysis of Feature Pooling in Visual Recognition[J]. International Conference on Machine Learning, 2010.
30.GLOROT X, BENGIO Y. Understanding the difficulty of training deep feedforward neural networks[J]. International Conference on Artificial Intelligence and Statistics, 2010.
31.MONTAVON G, ORR G, MLLER K R. Neural Networks: Tricks of the Trade[M/OL]//Lecture Notes in Computer Science. 2012.
32.KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J/OL]. Communications of the ACM, 2017: 84-90.
33.DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database[C/OL]//2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009.
34.LIN M, CHEN Q, YAN S. Network In Network[J]. arXiv: Neural and Evolutionary Computing, 2013.
35.GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[C/OL]//2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014.
36.ZHANG K, ZHANG Z, LI Z, et al. Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks[J/OL]. IEEE Signal Processing Letters, 2016: 1499-1503.
37.VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features[C/OL]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. 2005.
38.UIJLINGS J R R, VAN DE SANDE K E A, GEVERS T, et al. Selective Search for Object Recognition[J/OL]. International Journal of Computer Vision, 2013: 154-171.
39.VAN DE SANDE K E A, UIJLINGS J R R, GEVERS T, et al. Segmentation as selective search for object recognition[C/OL]//2011 International Conference on Computer Vision. 2011.
40.DALALN,TRIGGSB.HistogramsofOrientedGradientsforHumanDetection[C/OL]//2005IEEEComputerSocietyConferenceonComputerVisionandPatternRecognition(CVPR’05).2005.
41.GIRSHICKR.FastR-CNN[C/OL]//2015IEEEInternationalConferenceonComputerVision(ICCV).2015.
42.REDMONJ,FARHADIA.YOLO9000:Better,Faster,Stronger[C/OL]//2017IEEEConferenceonComputerVisionandPatternRecognition(CVPR).2017.
43.REDMONJ,FARHADIA.YOLOv3:AnIncrementalImprovement.[J].arXiv:ComputerVisionandPatternRecognition,2018.
44.BOCHKOVSKIYA,WANGCY,LIAOHY.YOLOv4:OptimalSpeedandAccuracyofObjectDetection[J].CornellUniversity-arXiv,2020.
45.LIC,LIL,JIANGH,etal.YOLOv6:ASingle-StageObjectDetectionFrameworkforIndustrialApplications[J].2022.
46.WANGCY,BOCHKOVSKIYA,LIAOHY.YOLOv7:Trainable bag-of-freebiessetsnewstate-of-the-artforreal-timeobjectdetectors[J].
47.BAHDANAUD,CHOK,BENGIO Y.NeuralMachineTranslation by Jointly Learning to Align and Translate[J]. Cornell University - arXiv, 2014.
48.HU J, SHEN L, ALBANIE S, et al. Squeeze-and-Excitation Networks[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020: 2011-2023.
49.HOU Q, ZHOU D, FENG J. Coordinate Attention for Efficient Mobile Network Design[C/OL]//2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021.
50.RAMACHANDRAN P, ZOPH B, LE QuocV. Searching for Activation Functions[J]. arXiv: Neural and Evolutionary Computing, 2017.
51.YU Z, HUANG H, CHEN W, et al. YOLO-FaceV2: A Scale and Occlusion Aware Face Detector[J]. Pattern Recognition, 2024, 155: 110714.
52.Wan D, Lu R, Shen S, et al. Mixed local channel attention for object detection[J]. Engineering Applications of Artificial Intelligence, 2023, 123: 106442.