本校學位論文庫
CITYU Theses & Dissertations
論文詳情
贡澄莹
郭永德
數據科學學院
數據科學碩士學位課程(中文學制)
碩士
2023
基於Transformer網路的遙感圖像超解析度研究
Research on the super-resolution of remote sensing images based on Transformer network
遙感圖像超分辨率 ; Transformer ; 稀疏表示 ; 深度卷積神經網絡 ; 亞像素空間
Remote sensing image super-resolution(SR) ; transformer; sparse representation ; convolutional neural networks(CNNs) ; sub-pixel space
遙感圖像在土地檢測、消防管控、軍事偵查等領域逐步展示了越來越重要的作用。其中空間分辨率是遙感圖像的一個重要指標,對各類應用都有著顯著影響。基於深度神經網絡的超分辨率實現相較于提升遙感成像器件可以實現更加便捷地部署提升效果。並且隨著深度神經網絡技術的發展,超分辨率的效果逐步達到了驚人的生成質量,已成為了提升圖像分辨率的有效手段。然而相較于一般自然圖像而言,遙感圖像超分辨率處理上往往受限於遙感圖像特有的地物關係複雜以及數據量繁雜冗餘的挑戰。Transformer最近在各種視覺任務上取得了重大突破,它有望能更好地獲取遙感圖像內部相似性的全域特徵。但現有的方法往往忽略了卷積和Transformer塊之間相互作用的優化,這顯著限制了在遙感超分辨率任務上的效果。並且現有技術通常通過使用各種注意力模塊的組合來採用有限的優化方法,這限制了特徵選擇,從而損害了超分辨率性能。Transformer自注意力機制帶來全域視野的同時會帶來相當的計算成本問題。直接使用更多的注意力機制會為整體網絡的實際部署帶來挑戰。
針對以上的現有問題與挑戰,本文旨在設計一種更輕量的卷積與Transformer融合的算法,並考慮對於自注意力進行改進。基於這一研究目的,我們提出了提出了一個稀疏激活的亞像素空間Transformer網絡(Sparse-activated Sub-pixel Transformer Network,SSTNet),並考慮了自注意力計算中的稀疏編碼理論,以提高網絡生成性能。該網絡通過引入稀疏表徵對自注意力編碼過程進行調整,並基於亞像素空間進行卷積與Transformer多級融合的解碼器構建。同時使用參數共享的思想對於解碼器計算成本進行精簡。對於所提出的網絡在UCMerced數據集和AID數據集這兩個主流的遙感圖像數據集上進行2倍、3倍和4倍的超分辨率重建測試實驗,並使用PSNR和SSIM兩個評價指標進行生成效果的定量分析。在UCMerced數據集和AID數據集上,所提出的方法相較於與幾種最先進的現有方法獲得了具有競爭力的生成評價指標,並且在視覺效果上擁有更好的生成輪廓效果。在數據集的圖像類別分析上,所提出的方法可以在3倍超分辨率實驗對比下在UCMerced數據集的8個類別取得最優,在AID數據集上取得所有類別的最優,顯示了所提出策略的有效穩定提升生成的效果。研究進一步使用消融研究對於所提出的策略進行對比,結果顯示出所提出方法的有效性與可靠性。
Remote sensing images have gradually shown an increasingly important role in land detection, fire control, military reconnaissance and other fields. Among them, spatial resolution is an important indicator of remote sensing images, which has a significant impact on various applications. The super-resolution implementation based on deep neural network can achieve a more convenient deployment and improvement effect than the improvement of remote sensing imaging devices. With the development of deep neural network technology, the effect of super-resolution has gradually reached an amazing generation quality, which has become an effective means to improve image resolution. However, compared with general natural images, the super-resolution processing of remote sensing images is often limited by the complex relationship between ground objects and the challenges of complex data volume and redundancy unique to remote sensing images. Transformer has recently made major breakthroughs in various visual tasks, and it is expected to better obtain the global characteristics of the internal similarity of remote sensing images. However, the existing methods often ignore the optimization of the interaction between convolution and Transformer blocks, which significantly limits the effect on remote sensing super-resolution tasks. Although the basic attention module strengthens the feature selection ability, it is still weak in generating superior quality output. The Transformer self-attention mechanism brings a global view but also brings considerable computational costs. Using more attention mechanisms directly creates challenges for real-world deployments of the overall network.
In view of the above existing problems and challenges, this paper aims to design a lightweight convolutional and Transformer fusion algorithm, and consider improving the self-attention. Based on this research purpose, we propose a sparse-activated sub-pixel transformer network (SSTNet), and consider the sparse coding theory in self-attention computing to improve the generative performance of the network. The network adjusts the self-attention coding process by introducing sparse representation, and constructs a decoder based on subpixel space for convolution and Transformer multi-level fusion. At the same time, the idea of parameter sharing is used to streamline the computational cost of the decoder. For the proposed network, 2x, 3x and 4x super-resolution reconstruction test experiments were carried out on the two mainstream remote sensing image datasets, the UC Merced dataset and the AID dataset, and the two evaluation indexes of PSNR and SSIM were used to quantitatively analyze the generation effect. On the UC Merced dataset and the AID dataset, the proposed method obtains competitive generative evaluation indexes compared with several state-of-the-art existing methods, and has a better visual effect on generating contours. In terms of image category analysis of the dataset, the proposed method can achieve the best results in 8 categories of the UC Merced dataset and the optimal results of all categories in the AID dataset under the comparison of 3x super-resolution experiments, which shows the effective and stable improvement effect of the proposed strategy. Ablation studies were further used to compare the proposed strategies, and the results showed the effectiveness and reliability of the proposed methods.
2024
中文
48
致謝 IV
摘要 IV
Abstract V
图目录 X
表目录 XI
第1章緒論 1
1.1研究背景及意義 1
1.2國內外研究現狀 2
1.2.1基於深度學習的圖像超分辨率方法 2
1.2.2基於深度學習的遙感圖像超分辨率方法 6
1.3研究目標及思路 8
1.4本文結構安排 10
第2章相關技術概述 11
2.1圖像超分辨率概念 11
2.2卷積神經網絡 11
2.3 Transformer網絡 14
2.4亞像素空間 15
2.5稀疏表示 16
2.6基於深度學習的圖像超分辨率重建方法的框架分類 17
2.6.1預上採樣框架 17
2.6.2後上採樣框架 18
2.6.3交替上下採樣框架 18
2.6.4漸進式上採樣框架 19
2.7本章小結 20
第3章基於Transformer的遙感圖像超分辨率網絡 21
3.1研究動機 21
3.2總體網絡結構 22
3.3稀疏激活的自注意力編碼器 24
3.4 亞像素空間的多級融合解碼器 27
3.5 本章小結 28
第4章 實驗結果與分析 29
4.1 實驗環境及參數設置 29
4.2 實驗數據集 29
4.3 遙感圖像的重建質量評價方法 30
4.3.1 峰值信噪比 30
4.3.2 結構相似性 31
4.4 UCMerced 數據集上的超分辨率實驗 31
4.5 AID數據集上的超分辨率實驗 34
4.6 對網絡模型的討論與分析 37
4.7 本章小結 39
第5章 總結與展望 40
5.1 工作總結 40
5.2 研究展望 40
参考文献 42
作者簡歷 48
[1] DONGC,LOYCC,HEK,etal. Image super-resolution using deep convolutional networks [J]. IEEE transactions on pattern analysis and machine intelligence, 2015, 38(2): 295-307.
[2] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.
[3] KIM J, LEE J K, LEE K M. Accurate image super-resolution using very deep convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1646-1654.
[4] SHIW,CABALLEROJ,HUSZÁRF,etal. Real-timesingle image and video super-resolution using an efficient sub-pixel convolutional neural network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1874-1883.
[5] LIM B, SON S, KIM H, et al. Enhanced deep residual networks for single image super resolution[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017: 136-144.
[6] LEDIG C, THEIS L, HUSZÁR F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4681-4690.
[7] YUJ,FANY,YANGJ,etal. Wideactivation for efficient and accurate image super-resolution [A]. 2018.
[8] LAIWS,HUANGJB,AHUJAN,etal. Deeplaplacian pyramid networks for fast and accurate super-resolution[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 624-632.
[9] LI J, FANG F, MEI K, et al. Multi-scale residual network for image super-resolution[C]// Proceedings of the European conference on computer vision (ECCV). 2018: 517-532.
[10] WANGX, YUK,WUS,etal. Esrgan: Enhanced super-resolution generative adversarial networks[C]//Proceedings of the European conference on computer vision (ECCV) workshops. 2018: 0-0.
[11] ZHANG W, LIU Y, DONG C, et al. Ranksrgan: Generative adversarial networks with ranker for image super-resolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 3096-3105.
[12] WANGX,XIEL,DONGC,etal. Real-esrgan: Trainingreal-worldblindsuper-resolution with pure synthetic data[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 1905-1914.
[13] PARK J, SON S, LEE K M. Content-aware local gan for photo-realistic super-resolution[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 1058510594.
[14] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[J]. Advances in neural information processing systems, 2014, 27.
[15] ZHANGY,LIK,LIK,etal. Imagesuper-resolution using very deep residual channel attention networks[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 286301.
[16] DAI T, CAI J, ZHANG Y, et al. Second-order attention network for single image superresolution[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 11065-11074.
[17] MEIY,FANY,ZHOUY,etal. Imagesuper-resolution with cross-scale non-local attention and exhaustive self-exemplars mining[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 5690-5699.
[18] DU Z, LIU D, LIU J, et al. Fast and memory-efficient network towards efficient image super-resolution[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 853-862.
[19] DEVLIN J, CHANG MW,LEEK,etal. Bert: Pre-training of deep bidirectional transformers for language understanding[A]. 2018.
[20] RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving language understanding by generative pre-training[M]. OpenAI, 2018.
[21] RADFORD A, WU J, CHILD R, et al. Language models are unsupervised multitask learners [J]. OpenAI blog, 2019, 1(8): 9.
[22] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[A]. 2020.
[23] CHEN H, WANGY,GUOT,etal. Pre-trained image processing transformer[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 12299-12310.
[24] LIANG J, CAO J, SUN G, et al. Swinir: Image restoration using swin transformer[C]// Proceedings of the IEEE/CVF international conference on computer vision. 2021: 1833-1844.
[25] LUZ,LIJ, LIU H, et al. Transformer for single image super-resolution[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 457-466.
[26] ZHANG X, ZENG H, GUO S, et al. Efficient long-range attention network for image superresolution[C]//European Conference on Computer Vision. Springer, 2022: 649-667.
[27] CHEN X, WANG X, ZHOU J, et al. Activating more pixels in image super-resolution transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 22367-22377.
[28] ZHOU Y, LI Z, GUO C L, et al. Srformer: Permuted self-attention for single image super resolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 12780-12791.
[29] LEI S, SHI Z, ZOU Z. Super-resolution for remote sensing images via local–global combined network[J]. IEEE Geoscience and Remote Sensing Letters, 2017, 14(8): 1243-1247.
[30] LIU Z, FENG R, WANG L, et al. Remote sensing image super-resolution via dilated convolution network with gradient prior[C]//IGARSS 2022-2022 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2022: 2402-2405.
[31] WANG T, SUN W, QI H, et al. Aerial image super resolution via wavelet multiscale convolutional neural networks[J]. IEEE Geoscience and Remote Sensing Letters, 2018, 15(5): 769-773.
[32] MAW,PANZ,GUOJ,etal. Achievingsuper-resolutionremotesensingimagesviathewavelet transform combined with the recursive res-net[J]. IEEE Transactions on Geoscience and Remote Sensing, 2019, 57(6): 3512-3527.
[33] ZHANGS,YUANQ,LIJ,etal. Scene-adaptiveremotesensingimagesuper-resolutionusinga multiscale attention network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2020, 58(7): 4764-4779.
[34] LEI S, SHI Z. Hybrid-scale self-similarity exploitation for remote sensing image super-resolution[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-10.
[35] LEI S, SHI Z, MO W. Transformer-based multistage enhancement for remote sensing image super-resolution[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-11.
[36] HU Y, LI J, HUANG Y, et al. Channel-wise and spatial feature modulation network for single image super-resolution[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2019, 30(11): 3911-3927.
[37] LIJ, FANGF,LIJ,etal. Mdcn: Multi-scale dense cross network for image super-resolution[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020, 31(7): 2547-2561.
[38] TONG T, LI G, LIU X, et al. Image super-resolution using dense skip connections[C]// Proceedings of the IEEE international conference on computer vision. 2017: 4799-4807.
[39] ZHANG Y, TIAN Y, KONG Y, et al. Residual dense network for image super-resolution[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 24722481.
[40] LIU J, ZHANG W, TANG Y, et al. Residual feature aggregation network for image superresolution[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 2359-2368.
[41] CHEN Z, ZHANG Y, GU J, et al. Dual aggregation transformer for image super-resolution [C]//Proceedings of the IEEE/CVF international conference on computer vision. 2023: 1231212321.
[42] 汪文翊. 基于生成对抗网络的遥感影像超分辨率算法[D]. 华东交通大学,2023.
[43] 曹博一. 基于卷积神经网络的图像超分辨率重建研究[D]. 兰州交通大学,2019.
[44] 李自红. 基于深度学习的单幅遥感图像超分辨率算法研究[D]. 宁夏大学,2023.
[45] 耿铭昆. 基于深度学习的轻量化图像超分辨率重建算法研究[D]. 中国科学院大学(中国科学院长春光学精密机械与物理研究所),2022.
[46] 靳阳阳. 遥感图像超分辨率重建方法研究[D]. 河南大学,2022.
[47] YANGY,NEWSAMS. Bag-of-visual-words and spatial extensions for land-use classification [C]//Proceedings of the 18th SIGSPATIAL international conference on advances in geographic information systems. 2010: 270-279.
[48] XIA G S, HU J, HU F, et al. Aid: A benchmark data set for performance evaluation of aerial scene classification[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(7): 3965-3981.
[49] ELADM,FEUERA. Restorationofasinglesuperresolutionimagefromseveralblurred, noisy, and undersampled measured images[J]. IEEE transactions on image processing, 1997, 6(12): 1646-1658.
[50] FARSIU S, ROBINSON D, ELAD M, et al. Advances and challenges in super-resolution[J]. International Journal of Imaging Systems and Technology, 2004, 14(2): 47-57.
[51] LIU C, SUN D. On bayesian adaptive video super resolution[J]. IEEE transactions on pattern analysis and machine intelligence, 2013, 36(2): 346-360.
[52] JEBADURAI J, PETER J D. Sk-svr: Sigmoid kernel support vector regression based in-scale single image super-resolution[J]. Pattern Recognition Letters, 2017, 94: 144-153.
[53] MOONS. Relunetworkwithbounded width is a universal approximator in view of an approximate identity[J]. Applied Sciences, 2021, 11(1): 427.
[54] ALBAWI S, MOHAMMED T A, AL-ZAWI S. Understanding of a convolutional neural network[C]//2017 international conference on engineering and technology (ICET). Ieee, 2017: 1-6.
[55] BASHA S S, DUBEY S R, PULABAIGARI V, et al. Impact of fully connected layers on performance of convolutional neural networks for image classification[J]. Neurocomputing, 2020, 378: 112-119.
[56] NG A, et al. Sparse autoencoder[J]. CS294A Lecture notes, 2011, 72(2011): 1-19.
[57] CHEN X, LIU Z, TANG H, et al. Sparsevit: Revisiting activation sparsity for efficient highresolution vision transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 2061-2070.
[58] AHARONM,ELADM,BRUCKSTEINA. K-svd: Analgorithm for designing overcomplete dictionaries for sparse representation[J]. IEEE Transactions on signal processing, 2006, 54(11): 4311-4322.
[59] YANGJ, WRIGHTJ,HUANGTS,etal. Image super-resolution via sparse representation[J]. IEEE transactions on image processing, 2010, 19(11): 2861-2873.
[60] ZEYDE R, ELAD M, PROTTER M. On single image scale-up using sparse-representations [C]//Curves and Surfaces: 7th International Conference, Avignon, France, June 24-30, 2010, Revised Selected Papers 7. Springer, 2012: 711-730.
[61] TIMOFTE R, DE SMET V, VAN GOOL L. Anchored neighborhood regression for fast example-based super-resolution[C]//Proceedings of the IEEE international conference on computer vision. 2013: 1920-1927.
[62] TIMOFTE R, DE SMET V, VAN GOOLL. A+: Adjusted anchored neighborhood regression for fast super-resolution[C]//Computer Vision–ACCV 2014: 12th Asian Conference on Computer Vision, Singapore, Singapore, November 1-5, 2014, Revised Selected Papers, Part IV 12. Springer, 2015: 111-126.
[63] LIU J, YANGW,ZHANGX,etal. Retrieval compensated group structured sparsity for image super-resolution[J]. IEEE Transactions on Multimedia, 2016, 19(2): 302-316.
[64] DONG C, LOY C C, HE K, et al. Learning a deep convolutional network for image super-resolution[C]//Computer Vision–ECCV2014: 13thEuropeanConference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13. Springer, 2014: 184-199.
[65] TAI Y, YANG J, LIU X, et al. Memnet: A persistent memory network for image restoration [C]//Proceedings of the IEEE international conference on computer vision. 2017: 4539-4547.
[66] SHOCHERA,COHENN,IRANIM.“zero-shot"super-resolutionusingdeepinternallearning [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 3118-3126.
[67] KIMJ,LEEJK,LEEKM.Deeply-recursiveconvolutionalnetworkforimagesuper-resolution [C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1637-1645.
[68] DONGC,LOYCC,TANGX. Acceleratingthesuper-resolutionconvolutionalneural network [C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016: 391-407.
[69] ZEILERMD,KRISHNAND,TAYLORGW,etal. Deconvolutionalnetworks[C]//2010IEEE Computer Society Conference on computer vision and pattern recognition. IEEE, 2010: 25282535.
[70] TIMOFTE R, ROTHE R, VAN GOOL L. Seven ways to improve example-based single image super resolution[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 1865-1873.
[71] IRANIM,PELEGS. Improvingresolutionbyimageregistration[J]. CVGIP:Graphicalmodels and image processing, 1991, 53(3): 231-239.
[72] HARIS M, SHAKHNAROVICH G, UKITA N. Deep back-projection networks for super-resolution[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 1664-1673.
[73] YANG CY, MAC,YANGMH. Single-image super-resolution: A benchmark[C]//Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part IV 13. Springer, 2014: 372-386.
[74] THUNG KH,RAVEENDRANP. Asurvey of image quality measures[C]//2009 international conference for technical postgraduates (TECHPOS). IEEE, 2009: 1-4.
[75] WANGY,PERAZZIF,MCWILLIAMSB,etal. Afullyprogressive approach to single-image super-resolution[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018: 864-873.
[76] BENGIOY,LOURADOURJ,COLLOBERTR,etal. Curriculumlearning[C]//Proceedings of the 26th annual international conference on machine learning. 2009: 41-48.
[77] WANGZ,BOVIKAC,SHEIKHHR,etal. Imagequalityassessment: from error visibility to structural similarity[J]. IEEE transactions on image processing, 2004, 13(4): 600-612.
[78] WANG Z, BOVIK A C. A universal image quality index[J]. IEEE signal processing letters, 2002, 9(3): 81-84.
[79] HAUTJM,PAOLETTIME,FERNÁNDEZ-BELTRANR,etal.Remotesensingsingle-image superresolution based on a deep compendium model[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 16(9): 1432-1436.
[80] WANG S, ZHOU T, LU Y, et al. Contextual transformation network for lightweight remotesensing image super-resolution[J]. IEEE Transactions on Geoscience and Remote Sensing, 2021, 60: 1-13.