本校學位論文庫
CITYU Theses & Dissertations
論文詳情
张宏梅
郭永德
數據科學學院
數據科學碩士學位課程(中文學制)
碩士
2022
基於Transformer的動態場景圖像去模糊方法研究
Research on Dynamic Scene Image Deblurring Method Based on Transformer
圖像去模糊 ; 視覺Transformer ; 窗口自注意力機制 ; 傅里葉變換
Image deblurring ; Visual Transformer ; Window Self-attention ; Fourier transform
公開日期:29/5/2026
圖像是人類視覺系統的主要載體。在圖像的成像過程中,受拍攝場景、成像設備以及其他客觀因素的影響,圖像的質量往往會有所下降,最為常見的就是圖像模糊。模糊的圖像不僅無法滿足人們的視覺需求,還會影響圖像在各個重要領域的實際應用。因此從模糊圖像中恢復圖像的紋理細節特徵,重建清晰的圖像是很有價值的。動態場景模糊圖像通常為非均勻模糊。由於圖像模糊的成因複雜,包括快速運動的物體、物體邊界遮擋等多種模糊,導致復原難度大。所以動態場景去模糊是一項具有挑戰性的任務。
Transformer模型能夠捕獲圖像像素長距離的相關關係而在計算器視覺任務上表現顯著。本文結合卷積神經網絡與Transformer結構,研究動態場景圖像去模糊的方法。
首先,本文提出了一種基於U-Net網絡改進的圖像去模糊方法。通過在編碼器的每一層增加融合通道信息的局部連接,增強網絡的特徵提取能力,並與其他的經典模型方法對比。實驗結果表明,U-Net模型相比於其他經典的多尺度網絡、對抗生成網絡等方法,特徵提取能力更強,處理模糊圖像的效率更高。
其次,針對基於卷積神經網絡的方法無法很好地復原動態場景下的大範圍運動模糊和運動物體遮擋導致的背景細節模糊問題,本文提出了一個基於Transformer結構和快速傅里葉卷積結合的動態場景圖像去模糊方法。通過快速傅里葉卷積跨尺度融合圖像的局部特徵和頻域全局特徵,並構建了一個基於傅里葉變換的Transformer模塊,在頻域內計算自注意力,另外使用了對比學習損失來優化模型。改進方法在GoPro數據集上的對比實驗取得了不錯的性能表現,其中客觀指標PSNR為32.98 dB、比大部分基於卷積神經網絡的方法至少提升了0.32 dB,模型參數量為17.9MB較其他方法均有所下降,單幅圖像恢復時間為2.19秒、客觀指標SSIM為0.970在其他去模糊方法處於中等水平。恢復後的圖像表明本文改進的方法能夠有效去除一些大範圍的運動模糊,復原圖像在亮度和對比度上表現更好。
Image is the main carrier of the human visual system. In the imaging process of images, the quality of images is often degraded by the shooting scene, imaging equipment, and other objective factors, the most common of which is image blurring. Blurred images not only fail to meet the visual needs of people but also affect the practical application of images in various important fields. Therefore it is valuable to recover the texture detail features of the image from the blurred image and reconstruct a clear image. Dynamic scene blurred images are usually non-uniform blurred. Due to the complex causes of image blurring, including fast-moving objects, object boundary occlusion, and other kinds of blurring, leads to the difficulty of recovery. Therefore, dynamic scene deblurring is a challenging task.
Transformer model is able to capture the correlation of image pixels over long distances and performs significantly on calculator vision tasks. In this thesis, we combine Convolutional Neural Network and Transformer structure to study the method of dynamic scene image deblurring.
Firstly, this thesis proposes an improved image deblurring method based on the U-Net network. The feature extraction capability of the network is enhanced by adding local connections of fused channel information in each layer of the encoder and compared with other classical model methods. The experimental results show that the U-Net model has a stronger feature extraction ability and higher efficiency in processing blurred images compared with other classical methods such as multiscale networks and adversarial generative networks.
Secondly, this thesis proposes a dynamic scene image deblurring method based on the combination of Transformer structure and Fast Fourier Convolution to address the issue of the inability of CNN-based methods to effectively restore large-scale motion blur and background detail blur caused by moving object occlusion in dynamic scenes. By using fast Fourier convolution to fuse local and frequency domain global features of images across scales, a Transformer module based on Fourier transform was constructed to calculate self attention in the frequency domain. Additionally, comparative learning loss was used to optimize the model. The improved method achieved good performance in comparative experiments on the GoPro dataset, with an objective indicator PSNR of 32.98 dB, which is at least 0.32 dB higher than most convolutional neural network-based methods. The model parameter size is 17.9MB, which has decreased compared to other methods. The single image recovery time is 2.19 seconds, and the objective indicator SSIM is 0.970, which is at a moderate level in other deblurring methods. The restored image indicates that the improved method in this thesis can effectively remove some large-scale motion blur, and the restored image performs better in brightness and contrast.
2023
中文
62
致 謝 III
摘 要 IV
Abstract V
圖目錄 IX
表目錄 X
第一章 緒 論 1
1.1 研究背景和意義 1
1.2 圖像去模糊技術的國內外研究現狀 2
1.2.1 傳統的圖像去模糊方法 2
1.2.2 基於卷積神經網絡的圖像去模糊方法 4
1.2.3 基於 Transformer 的圖像去模糊方法 6
1.3 論文問題與研究內容 8
1.4 論文貢獻與結構安排 9
第二章 圖像去模糊相關理論基礎 10
2.1 圖像去模糊理論 10
2.1.1 圖像模糊類型分類 10
2.1.2 圖像的傅里葉變換 11
2.1.3 圖像去模糊質量評價指標 12
2.2 卷積神經網絡模型 13
2.2.1 U-Net網絡模型 13
2.2.2 ResNet網絡模型 14
2.3 Transformer網絡模型 15
2.3.1 Transformer模型結構 16
2.3.2 網絡模型關鍵模塊 17
2.3.3 視覺Transformer應用 19
2.4 本章小結 21
第三章 基於U-Net的圖像去模糊方法 22
3.1 模型的網絡結構 22
3.2 損失函數 23
3.3 實驗結果與分析 24
3.3.1 實驗數據集及其預處理 24
3.3.2 實驗環境和參數設置 25
3.3.3 GoPro數據集的實驗結果與分析 26
3.3.4 HIDE數據集的實驗結果與分析 28
3.4 有效性驗證實驗 30
3.5 本章小結 30
第四章 基於Tranformer改進的圖像去模糊方法 31
4.1 模型的網絡結構 31
4.1.1 編碼器-解碼器架構 31
4.1.2 快速傅里葉卷積投影模塊 33
4.1.3 基於傅里葉單元的Transformer模塊 36
4.2 損失函數 39
4.3 實驗結果與分析 40
4.3.1 實驗細節 40
4.3.2 GoPro數據集的實驗結果與分析 40
4.3.3 HIDE數據集的實驗結果與分析 43
4.3.4 RealBlur數據集的實驗結果與分析 46
4.4 有效性驗證實驗 47
4.4.1 快速傅里葉卷積投影模塊有效性 48
4.4.2 傅里葉單元自注意力機制的有效性 49
4.5 本章小結 50
第五章 總結與展望 51
5.1 總結 51
5.2 對未來工作的展望 52
參考文獻 54
作者簡歷 60
附 錄 61
[1] 羅四維. 視覺信息認知計算理論[M]. 北京: 科學出版社, 2010.
[2] Ma Q, Huang C, Zheng Z, et al. Blind Ultrasound Images Deblurring Based on Quadratic Sparse Bright Channel Prior[C]// 2021 The 4th International Conference on Image and Graphics Processing, 2021: 189-193.
[3] Sharif S, Naqvi R A, Mehmood Z, et al. MedDeblur: Medical Image Deblurring with Residual Dense Spatial-Asymmetric Attention[J]. Mathematics, 2022, 11(1): 115.
[4] Zhao H, Ke Z, Chen N, et al. A new deep learning method for image deblurring in optical microscopic systems[J]. Journal of Biophotonics, 2020, 13(3): e201960147.
[5] Liu J, Gao Q, Tang Z, et al. Online monitoring of flotation froth bubble-size distributions via multiscale deblurring and multistage jumping feature-fused full convolutional networks[J]. IEEE Transactions on Instrumentation and Measurement, 2020, 69(12): 9618-9633.
[6] Zuo S, Wang M, Ni Y, et al. Deblurring Reconstruction of Monitoring Video in Smart Grid Based on Depth-wise Separable Convolutional Neural Network[C]// 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), 2022: 1609-1615.
[7] Deshpande A M, Roy S. An Efficient Image Deblurring Method with a Deep Convolutional Neural Network for Satellite Imagery[J]. Journal of the Indian Society of Remote Sensing, 2021, 49(11): 2903-2917.
[8] Geng L, Nie X, Niu S, et al. Structural compact core tensor dictionary learning for multispec-tral remote sensing image deblurring[C]// 2018 25th IEEE International Conference on Image Processing (ICIP), 2018: 2865-2869.
[9] Whyte O, Sivic J, Zisserman A, et al. Non-uniform deblurring for shaken images[J]. International journal of computer vision, 2012, 98: 168-186.
[10] Metari S, Deschenes F. A New Convolution Kernel for Atmospheric Point Spread Function Applied to Computer Vision[C]// 2007 IEEE 11th International Conference on Computer Vision, 2007: 1-8.
[11] Cho S, Lee S: Fast motion deblurring, ACM SIGGRAPH Asia 2009 papers, 2009: 1-8.
[12] Matsuo H, Iwata A, Horiba I, et al. Three-dimensional image reconstruction by digital tomo-synthesis using inverse filtering[J]. IEEE transactions on medical imaging, 1993, 12(2): 307-313.
[13] Wiener N, Wiener N, Mathematician C, et al. Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications[M]. Cambridge, MA: MIT press, 1949.
[14] Lucy L B. An iterative technique for the rectification of observed distributions[J]. The astronomical journal, 1974, 79: 745.
[15] Richardson W H. Bayesian-Based Iterative Method of Image Restoration*[J]. Journal of the Optical Society of America, 1972, 62(1): 55-59.
[16] Fergus R, Singh B, Hertzmann A, et al.: Removing camera shake from a single photograph, Acm Siggraph 2006 Papers, 2006: 787-794.
[17] Xu L, Jia J. Two-phase kernel estimation for robust motion deblurring[C]// European conference on computer vision, 2010: 157-170.
[18] Krishnan D, Tay T, Fergus R. Blind deconvolution using a normalized sparsity measure[C]// CVPR 2011, 2011: 233-240.
[19] Levin A, Weiss Y, Durand F, et al. Understanding and evaluating blind deconvolution algorithms[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 1964-1971.
[20] Pan J, Hu Z, Su Z, et al. Deblurring text images via L0-regularized intensity and gradient prior[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 2901-2908.
[21] Xu L, Zheng S, Jia J. Unnatural l0 sparse representation for natural image deblurring[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2013: 1107-1114.
[22] Michaeli T, Irani M. Blind deblurring using internal patch recurrence[C]// European conference on computer vision, 2014: 783-798.
[23] Pan J, Sun D, Pfister H, et al. Blind image deblurring using dark channel prior[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 1628-1636.
[24] Levin A. Blind motion deblurring using image statistics[J]. Advances in Neural Information Processing Systems, 2006, 19.
[25] Harmeling S, Michael H, Schölkopf B. Space-variant single-image blind deconvolution for removing camera shake[J]. Advances in Neural Information Processing Systems, 2010, 23.
[26] Ji H, Wang K. A two-stage approach to blind spatially-varying motion deblurring[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 73-80.
[27] Couzinie-Devy F, Sun J, Alahari K, et al. Learning to estimate and remove non-uniform image blur[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013: 1075-1082.
[28] Hyun Kim T, Ahn B, Mu Lee K. Dynamic scene deblurring[C]// Proceedings of the IEEE international conference on computer vision(ICCV), 2013: 3160-3167.
[29] Hyun Kim T, Mu Lee K. Segmentation-free dynamic scene deblurring[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2014: 2766-2773.
[30] Pan J, Hu Z, Su Z, et al. Soft-segmentation guided object motion deblurring[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 459-468.
[31] Anwar S, Barnes N. Real image denoising with feature attention[C]// Proceedings of the IEEE/CVF international conference on computer vision, 2019: 3155-3164.
[32] Fu X, Huang J, Zeng D, et al. Removing rain from single images via a deep detail network[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 3855-3863.
[33] Jiang K, Wang Z, Yi P, et al. Multi-scale progressive fusion network for single image deraining[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020: 8346-8355.
[34] Li X, Wu J, Lin Z, et al. Recurrent squeeze-and-excitation context aggregation net for single image deraining[C]// Proceedings of the European conference on computer vision (ECCV), 2018: 254-269.
[35] Li L, Dong Y, Ren W, et al. Semi-supervised image dehazing[J]. IEEE Transactions on Image Processing, 2019, 29: 2766-2779.
[36] Tai Y, Yang J, Liu X. Image super-resolution via deep recursive residual network[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 3147-3155.
[37] Xu L, Ren J S, Liu C, et al. Deep convolutional neural network for image deconvolution[J]. Advances in neural information processing systems, 2014, 27.
[38] Zhang J, Pan J, Lai W-S, et al. Learning fully convolutional networks for iterative non-blind deconvolution[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 3817-3825.
[39] Sun J, Cao W, Xu Z, et al. Learning a convolutional neural network for non-uniform motion blur removal[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2015: 769-777.
[40] Nah S, Hyun Kim T, Mu Lee K. Deep multi-scale convolutional neural network for dynamic scene deblurring[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 3883-3891.
[41] Tao X, Gao H, Shen X, et al. Scale-recurrent network for deep image deblurring[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 8174-8182.
[42] Kupyn O, Budzan V, Mykhailych M, et al. Deblurgan: Blind motion deblurring using conditional adversarial networks[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 8183-8192.
[43] Kupyn O, Martyniuk T, Wu J, et al. Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better[C]// Proceedings of the IEEE/CVF international conference on computer vision, 2019: 8878-8887.
[44] Zhang H, Dai Y, Li H, et al. Deep stacked hierarchical multi-patch network for image deblurring[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5978-5986.
[45] Suin M, Purohit K, Rajagopalan A. Spatially-attentive patch-hierarchical network for adaptive motion deblurring[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 3606-3615.
[46] Zamir S W, Arora A, Khan S, et al. Multi-stage progressive image restoration[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021: 14821-14831.
[47] Cho S-J, Ji S-W, Hong J-P, et al. Rethinking coarse-to-fine approach in single image deblurring[C]// Proceedings of the IEEE/CVF international conference on computer vision, 2021: 4641-4650.
[48] Yang D, Yamaç M. Motion Aware Double Attention Network for Dynamic Scene Deblurring[J]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022: 1112-1122.
[49] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[50] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
[51] Han K, Xiao A, Wu E, et al. Transformer in transformer[J]. Advances in Neural Information Processing Systems, 2021, 34: 15908-15919.
[52] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[53] Liu Z, Hu H, Lin Y, et al. Swin transformer v2: Scaling up capacity and resolution[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12009-12019.
[54] Chu X, Tian Z, Zhang B, et al. Conditional positional encodings for vision transformers[J]. arXiv preprint arXiv:2102.10882, 2021.
[55] Chen H, Wang Y, Guo T, et al. Pre-trained image processing transformer[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 12299-12310.
[56] Liang J, Cao J, Sun G, et al. Swinir: Image restoration using swin transformer[C]// Proceedings of the IEEE/CVF international conference on computer vision, 2021: 1833-1844.
[57] Zamir S W, Arora A, Khan S, et al. Restormer: Efficient transformer for high-resolution image restoration[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5728-5739.
[58] Tsai F-J, Peng Y-T, Lin Y-Y, et al. Stripformer: Strip transformer for fast image deblurring[C]// Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIX, 2022: 146-162.
[59] Wang Z, Cun X, Bao J, et al. Uformer: A general u-shaped transformer for image restoration[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 17683-17693.
[60] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]// International Conference on Medical image computing and computer-assisted intervention, 2015: 234-241.
[61] Wu H, Xiao B, Codella N, et al. Cvt: Introducing convolutions to vision transformers[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 22-31.
[62] Yuan K, Guo S, Liu Z, et al. Incorporating convolution designs into visual transformers[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 579-588.
[63] Guo C, Wang Q, Dai H-N, et al. LNNet: Lightweight nested network for motion deblurring[J]. Journal of Systems Architecture, 2022, 129: 102584.
[64] Rajagopalan A, Chellappa R. Motion deblurring: Algorithms and systems[M]. Cambridge University Press, 2014.
[65] Shan Q, Jia J, Agarwala A. High-quality motion deblurring from a single image[J]. Acm transactions on graphics (tog), 2008, 27(3): 1-10.
[66] Chen Y-S, Choa I-S. An approach to estimating the motion parameters for a linear motion blurred image[J]. IEICE TRANSACTIONS on Information and Systems, 2000, 83(7): 1601-1603.
[67] Quan Y, Wu Z, Ji H. Gaussian kernel mixture network for single image defocus deblurring[J]. Advances in Neural Information Processing Systems, 2021, 34: 20812-20824.
[68] Zhang X, Wang R, Jiang X, et al. Spatially variant defocus blur map estimation and deblurring from a single image[J]. Journal of Visual Communication and Image Representation, 2016, 35: 257-264.
[69] Zhou C, Lin S, Nayar S K. Coded Aperture Pairs for Depth from Defocus and Defocus Deblurring[J]. International journal of computer vision, 2011, 93(1).
[70] Hummel R A, Kimia B, Zucker S W. Deblurring gaussian blur[J]. Computer Vision, Graphics, and Image Processing, 1987, 38(1): 66-80.
[71] 丁怡心, 廖勇毅. 高斯模糊算法優化及實現[J]. 現代計算機, 2010(8): 76-77.
[72] 顧亞芳. 高斯模糊圖像的盲復原[J]. 科教文匯, 2008(5): 74-74.
[73] Cai J, Zuo W, Zhang L. Dark and bright channel prior embedded network for dynamic scene deblurring[J]. IEEE Transactions on Image Processing, 2020, 29: 6885-6897.
[74] Li L, Pan J, Lai W-S, et al. Dynamic scene deblurring by depth guided model[J]. IEEE Transactions on Image Processing, 2020, 29: 5273-5288.
[75] Fatima Bokhari S T, Sharif M, Yasmin M, et al. Fundus image segmentation and feature extraction for the detection of glaucoma: A new approach[J]. Current Medical Imaging, 2018, 14(1): 77-87.
[76] Singh N, Khan R. Speaker Recognition and Fast Fourier Transform[J]. International Journal, 2015, 5(7).
[77] 何南南, 解凱, 李桐, et al. 圖像質量評價綜述[J]. 北京印刷學院學報, 2017, 25(2): 47-50.
[78] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2015: 3431-3440.
[79] Agarap A F. Deep learning using rectified linear units (relu)[J]. arXiv preprint arXiv:1803.08375, 2018.
[80] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770-778.
[81] Srivastava R K, Greff K, Schmidhuber J. Highway networks[J]. arXiv preprint arXiv:1505.00387, 2015.
[82] Medsker L R, Jain L. Recurrent neural networks[J]. Design and Applications, 2001, 5: 64-67.
[83] Dong L, Xu S, Xu B. Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition[C]// 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2018: 5884-5888.
[84] Huang W-C, Hayashi T, Wu Y-C, et al. Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-speech pretraining[J]. arXiv preprint arXiv:1912.06813, 2019.
[85] Huang X, Deng Z, Li D, et al. MISSFormer: An Effective Transformer for 2D Medical Image Segmentation[J]. IEEE Transactions on Medical Imaging, 2022.
[86] Wu Y, Liao K, Chen J, et al. D-former: A u-shaped dilated transformer for 3d medical image segmentation[J]. Neural Computing and Applications, 2023, 35(2): 1931-1944.
[87] Shen Z, Wang W, Lu X, et al. Human-aware motion deblurring[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 5572-5581.
[88] Chi L, Jiang B, Mu Y. Fast fourier convolution[J]. Advances in Neural Information Processing Systems, 2020, 33: 4479-4488.
[89] Hendrycks D, Gimpel K. Gaussian error linear units (gelus)[J]. arXiv preprint arXiv:1606.08415, 2020.
[90] Rim J, Lee H, Won J, et al. Real-world blur dataset for learning and benchmarking deblurring algorithms[C]// Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, 2020: 184-201.