Image is the main carrier of the human visual system. In the imaging process of images, the quality of images is often degraded by the shooting scene, imaging equipment, and other objective factors, the most common of which is image blurring. Blurred images not only fail to meet the visual needs of people but also affect the practical application of images in various important fields. Therefore it is valuable to recover the texture detail features of the image from the blurred image and reconstruct a clear image. Dynamic scene blurred images are usually non-uniform blurred. Due to the complex causes of image blurring, including fast-moving objects, object boundary occlusion, and other kinds of blurring, leads to the difficulty of recovery. Therefore, dynamic scene deblurring is a challenging task.
Transformer model is able to capture the correlation of image pixels over long distances and performs significantly on calculator vision tasks. In this thesis, we combine Convolutional Neural Network and Transformer structure to study the method of dynamic scene image deblurring.
Firstly, this thesis proposes an improved image deblurring method based on the U-Net network. The feature extraction capability of the network is enhanced by adding local connections of fused channel information in each layer of the encoder and compared with other classical model methods. The experimental results show that the U-Net model has a stronger feature extraction ability and higher efficiency in processing blurred images compared with other classical methods such as multiscale networks and adversarial generative networks.
Secondly, this thesis proposes a dynamic scene image deblurring method based on the combination of Transformer structure and Fast Fourier Convolution to address the issue of the inability of CNN-based methods to effectively restore large-scale motion blur and background detail blur caused by moving object occlusion in dynamic scenes. By using fast Fourier convolution to fuse local and frequency domain global features of images across scales, a Transformer module based on Fourier transform was constructed to calculate self attention in the frequency domain. Additionally, comparative learning loss was used to optimize the model. The improved method achieved good performance in comparative experiments on the GoPro dataset, with an objective indicator PSNR of 32.98 dB, which is at least 0.32 dB higher than most convolutional neural network-based methods. The model parameter size is 17.9MB, which has decreased compared to other methods. The single image recovery time is 2.19 seconds, and the objective indicator SSIM is 0.970, which is at a moderate level in other deblurring methods. The restored image indicates that the improved method in this thesis can effectively remove some large-scale motion blur, and the restored image performs better in brightness and contrast.
[1] 羅四維. 視覺信息認知計算理論[M]. 北京: 科學出版社, 2010.
[2] Ma Q, Huang C, Zheng Z, et al. Blind Ultrasound Images Deblurring Based on Quadratic Sparse Bright Channel Prior[C]// 2021 The 4th International Conference on Image and Graphics Processing, 2021: 189-193.
[3] Sharif S, Naqvi R A, Mehmood Z, et al. MedDeblur: Medical Image Deblurring with Residual Dense Spatial-Asymmetric Attention[J]. Mathematics, 2022, 11(1): 115.
[4] Zhao H, Ke Z, Chen N, et al. A new deep learning method for image deblurring in optical microscopic systems[J]. Journal of Biophotonics, 2020, 13(3): e201960147.
[5] Liu J, Gao Q, Tang Z, et al. Online monitoring of flotation froth bubble-size distributions via multiscale deblurring and multistage jumping feature-fused full convolutional networks[J]. IEEE Transactions on Instrumentation and Measurement, 2020, 69(12): 9618-9633.
[6] Zuo S, Wang M, Ni Y, et al. Deblurring Reconstruction of Monitoring Video in Smart Grid Based on Depth-wise Separable Convolutional Neural Network[C]// 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), 2022: 1609-1615.
[7] Deshpande A M, Roy S. An Efficient Image Deblurring Method with a Deep Convolutional Neural Network for Satellite Imagery[J]. Journal of the Indian Society of Remote Sensing, 2021, 49(11): 2903-2917.
[8] Geng L, Nie X, Niu S, et al. Structural compact core tensor dictionary learning for multispec-tral remote sensing image deblurring[C]// 2018 25th IEEE International Conference on Image Processing (ICIP), 2018: 2865-2869.
[9] Whyte O, Sivic J, Zisserman A, et al. Non-uniform deblurring for shaken images[J]. International journal of computer vision, 2012, 98: 168-186.
[10] Metari S, Deschenes F. A New Convolution Kernel for Atmospheric Point Spread Function Applied to Computer Vision[C]// 2007 IEEE 11th International Conference on Computer Vision, 2007: 1-8.
[11] Cho S, Lee S: Fast motion deblurring, ACM SIGGRAPH Asia 2009 papers, 2009: 1-8.
[12] Matsuo H, Iwata A, Horiba I, et al. Three-dimensional image reconstruction by digital tomo-synthesis using inverse filtering[J]. IEEE transactions on medical imaging, 1993, 12(2): 307-313.
[13] Wiener N, Wiener N, Mathematician C, et al. Extrapolation, interpolation, and smoothing of stationary time series: with engineering applications[M]. Cambridge, MA: MIT press, 1949.
[14] Lucy L B. An iterative technique for the rectification of observed distributions[J]. The astronomical journal, 1974, 79: 745.
[15] Richardson W H. Bayesian-Based Iterative Method of Image Restoration*[J]. Journal of the Optical Society of America, 1972, 62(1): 55-59.
[16] Fergus R, Singh B, Hertzmann A, et al.: Removing camera shake from a single photograph, Acm Siggraph 2006 Papers, 2006: 787-794.
[17] Xu L, Jia J. Two-phase kernel estimation for robust motion deblurring[C]// European conference on computer vision, 2010: 157-170.
[18] Krishnan D, Tay T, Fergus R. Blind deconvolution using a normalized sparsity measure[C]// CVPR 2011, 2011: 233-240.
[19] Levin A, Weiss Y, Durand F, et al. Understanding and evaluating blind deconvolution algorithms[C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 1964-1971.
[20] Pan J, Hu Z, Su Z, et al. Deblurring text images via L0-regularized intensity and gradient prior[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014: 2901-2908.
[21] Xu L, Zheng S, Jia J. Unnatural l0 sparse representation for natural image deblurring[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2013: 1107-1114.
[22] Michaeli T, Irani M. Blind deblurring using internal patch recurrence[C]// European conference on computer vision, 2014: 783-798.
[23] Pan J, Sun D, Pfister H, et al. Blind image deblurring using dark channel prior[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 1628-1636.
[24] Levin A. Blind motion deblurring using image statistics[J]. Advances in Neural Information Processing Systems, 2006, 19.
[25] Harmeling S, Michael H, Schölkopf B. Space-variant single-image blind deconvolution for removing camera shake[J]. Advances in Neural Information Processing Systems, 2010, 23.
[26] Ji H, Wang K. A two-stage approach to blind spatially-varying motion deblurring[C]// 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012: 73-80.
[27] Couzinie-Devy F, Sun J, Alahari K, et al. Learning to estimate and remove non-uniform image blur[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013: 1075-1082.
[28] Hyun Kim T, Ahn B, Mu Lee K. Dynamic scene deblurring[C]// Proceedings of the IEEE international conference on computer vision(ICCV), 2013: 3160-3167.
[29] Hyun Kim T, Mu Lee K. Segmentation-free dynamic scene deblurring[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2014: 2766-2773.
[30] Pan J, Hu Z, Su Z, et al. Soft-segmentation guided object motion deblurring[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 459-468.
[31] Anwar S, Barnes N. Real image denoising with feature attention[C]// Proceedings of the IEEE/CVF international conference on computer vision, 2019: 3155-3164.
[32] Fu X, Huang J, Zeng D, et al. Removing rain from single images via a deep detail network[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 3855-3863.
[33] Jiang K, Wang Z, Yi P, et al. Multi-scale progressive fusion network for single image deraining[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020: 8346-8355.
[34] Li X, Wu J, Lin Z, et al. Recurrent squeeze-and-excitation context aggregation net for single image deraining[C]// Proceedings of the European conference on computer vision (ECCV), 2018: 254-269.
[35] Li L, Dong Y, Ren W, et al. Semi-supervised image dehazing[J]. IEEE Transactions on Image Processing, 2019, 29: 2766-2779.
[36] Tai Y, Yang J, Liu X. Image super-resolution via deep recursive residual network[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 3147-3155.
[37] Xu L, Ren J S, Liu C, et al. Deep convolutional neural network for image deconvolution[J]. Advances in neural information processing systems, 2014, 27.
[38] Zhang J, Pan J, Lai W-S, et al. Learning fully convolutional networks for iterative non-blind deconvolution[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017: 3817-3825.
[39] Sun J, Cao W, Xu Z, et al. Learning a convolutional neural network for non-uniform motion blur removal[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2015: 769-777.
[40] Nah S, Hyun Kim T, Mu Lee K. Deep multi-scale convolutional neural network for dynamic scene deblurring[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2017: 3883-3891.
[41] Tao X, Gao H, Shen X, et al. Scale-recurrent network for deep image deblurring[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 8174-8182.
[42] Kupyn O, Budzan V, Mykhailych M, et al. Deblurgan: Blind motion deblurring using conditional adversarial networks[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2018: 8183-8192.
[43] Kupyn O, Martyniuk T, Wu J, et al. Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better[C]// Proceedings of the IEEE/CVF international conference on computer vision, 2019: 8878-8887.
[44] Zhang H, Dai Y, Li H, et al. Deep stacked hierarchical multi-patch network for image deblurring[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 5978-5986.
[45] Suin M, Purohit K, Rajagopalan A. Spatially-attentive patch-hierarchical network for adaptive motion deblurring[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020: 3606-3615.
[46] Zamir S W, Arora A, Khan S, et al. Multi-stage progressive image restoration[C]// Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021: 14821-14831.
[47] Cho S-J, Ji S-W, Hong J-P, et al. Rethinking coarse-to-fine approach in single image deblurring[C]// Proceedings of the IEEE/CVF international conference on computer vision, 2021: 4641-4650.
[48] Yang D, Yamaç M. Motion Aware Double Attention Network for Dynamic Scene Deblurring[J]. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2022: 1112-1122.
[49] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[50] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.
[51] Han K, Xiao A, Wu E, et al. Transformer in transformer[J]. Advances in Neural Information Processing Systems, 2021, 34: 15908-15919.
[52] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 10012-10022.
[53] Liu Z, Hu H, Lin Y, et al. Swin transformer v2: Scaling up capacity and resolution[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 12009-12019.
[54] Chu X, Tian Z, Zhang B, et al. Conditional positional encodings for vision transformers[J]. arXiv preprint arXiv:2102.10882, 2021.
[55] Chen H, Wang Y, Guo T, et al. Pre-trained image processing transformer[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021: 12299-12310.
[56] Liang J, Cao J, Sun G, et al. Swinir: Image restoration using swin transformer[C]// Proceedings of the IEEE/CVF international conference on computer vision, 2021: 1833-1844.
[57] Zamir S W, Arora A, Khan S, et al. Restormer: Efficient transformer for high-resolution image restoration[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 5728-5739.
[58] Tsai F-J, Peng Y-T, Lin Y-Y, et al. Stripformer: Strip transformer for fast image deblurring[C]// Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XIX, 2022: 146-162.
[59] Wang Z, Cun X, Bao J, et al. Uformer: A general u-shaped transformer for image restoration[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022: 17683-17693.
[60] Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation[C]// International Conference on Medical image computing and computer-assisted intervention, 2015: 234-241.
[61] Wu H, Xiao B, Codella N, et al. Cvt: Introducing convolutions to vision transformers[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 22-31.
[62] Yuan K, Guo S, Liu Z, et al. Incorporating convolution designs into visual transformers[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 579-588.
[63] Guo C, Wang Q, Dai H-N, et al. LNNet: Lightweight nested network for motion deblurring[J]. Journal of Systems Architecture, 2022, 129: 102584.
[64] Rajagopalan A, Chellappa R. Motion deblurring: Algorithms and systems[M]. Cambridge University Press, 2014.
[65] Shan Q, Jia J, Agarwala A. High-quality motion deblurring from a single image[J]. Acm transactions on graphics (tog), 2008, 27(3): 1-10.
[66] Chen Y-S, Choa I-S. An approach to estimating the motion parameters for a linear motion blurred image[J]. IEICE TRANSACTIONS on Information and Systems, 2000, 83(7): 1601-1603.
[67] Quan Y, Wu Z, Ji H. Gaussian kernel mixture network for single image defocus deblurring[J]. Advances in Neural Information Processing Systems, 2021, 34: 20812-20824.
[68] Zhang X, Wang R, Jiang X, et al. Spatially variant defocus blur map estimation and deblurring from a single image[J]. Journal of Visual Communication and Image Representation, 2016, 35: 257-264.
[69] Zhou C, Lin S, Nayar S K. Coded Aperture Pairs for Depth from Defocus and Defocus Deblurring[J]. International journal of computer vision, 2011, 93(1).
[70] Hummel R A, Kimia B, Zucker S W. Deblurring gaussian blur[J]. Computer Vision, Graphics, and Image Processing, 1987, 38(1): 66-80.
[71] 丁怡心, 廖勇毅. 高斯模糊算法優化及實現[J]. 現代計算機, 2010(8): 76-77.
[72] 顧亞芳. 高斯模糊圖像的盲復原[J]. 科教文匯, 2008(5): 74-74.
[73] Cai J, Zuo W, Zhang L. Dark and bright channel prior embedded network for dynamic scene deblurring[J]. IEEE Transactions on Image Processing, 2020, 29: 6885-6897.
[74] Li L, Pan J, Lai W-S, et al. Dynamic scene deblurring by depth guided model[J]. IEEE Transactions on Image Processing, 2020, 29: 5273-5288.
[75] Fatima Bokhari S T, Sharif M, Yasmin M, et al. Fundus image segmentation and feature extraction for the detection of glaucoma: A new approach[J]. Current Medical Imaging, 2018, 14(1): 77-87.
[76] Singh N, Khan R. Speaker Recognition and Fast Fourier Transform[J]. International Journal, 2015, 5(7).
[77] 何南南, 解凱, 李桐, et al. 圖像質量評價綜述[J]. 北京印刷學院學報, 2017, 25(2): 47-50.
[78] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2015: 3431-3440.
[79] Agarap A F. Deep learning using rectified linear units (relu)[J]. arXiv preprint arXiv:1803.08375, 2018.
[80] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]// Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770-778.
[81] Srivastava R K, Greff K, Schmidhuber J. Highway networks[J]. arXiv preprint arXiv:1505.00387, 2015.
[82] Medsker L R, Jain L. Recurrent neural networks[J]. Design and Applications, 2001, 5: 64-67.
[83] Dong L, Xu S, Xu B. Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition[C]// 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), 2018: 5884-5888.
[84] Huang W-C, Hayashi T, Wu Y-C, et al. Voice transformer network: Sequence-to-sequence voice conversion using transformer with text-to-speech pretraining[J]. arXiv preprint arXiv:1912.06813, 2019.
[85] Huang X, Deng Z, Li D, et al. MISSFormer: An Effective Transformer for 2D Medical Image Segmentation[J]. IEEE Transactions on Medical Imaging, 2022.
[86] Wu Y, Liao K, Chen J, et al. D-former: A u-shaped dilated transformer for 3d medical image segmentation[J]. Neural Computing and Applications, 2023, 35(2): 1931-1944.
[87] Shen Z, Wang W, Lu X, et al. Human-aware motion deblurring[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 5572-5581.
[88] Chi L, Jiang B, Mu Y. Fast fourier convolution[J]. Advances in Neural Information Processing Systems, 2020, 33: 4479-4488.
[89] Hendrycks D, Gimpel K. Gaussian error linear units (gelus)[J]. arXiv preprint arXiv:1606.08415, 2020.
[90] Rim J, Lee H, Won J, et al. Real-world blur dataset for learning and benchmarking deblurring algorithms[C]// Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16, 2020: 184-201.