本校學位論文庫
CITYU Theses & Dissertations
論文詳情
陈泳霖
丁聖勇
數據科學學院
數據科學碩士學位課程(中文學制)
碩士
2024
生成式模型提示詞工程研究及在文化遺產保護中的應用
Prompt Engineering for Generative Models and Applications in Cultural Heritage Protection
人機交互 ; 生成式模型 ; 提示詞工程 ; 文化遺產 ; 自然語言處理 ; 用戶體驗
Human-Computer Interaction ; Generative Models ; Prompt Engineering ; Cultural Heritage ; Natural language processing ; User Experience
公開日期:9/8/2027
文化遺產是指一個國家或民族的歷史、文化和科學成就的實體見證,在數字化時代,文化遺產的數字化保存與展示具有至關重要的意義。隨著生成式模型(Generative Models)技術的成熟,為文化遺產帶來了新的展示和互動方法。然而生成式模型在文化遺產領域的研究較少,所以生成式模型技術如何應用到文化遺產展示領域,以及效果能否比傳統展覽顯著增加用戶參與度,成了一個需要討論的問題。其次,生成式模型的輸出與其輸入,即提示詞(Prompt)的質量存在關係。在生成式模型中,如果提示詞缺乏相關內容,或模型訓練集中缺乏相關語料,會導致生成的內容與文化遺產本身的特質偏離較大,甚至存在胡編亂造的情況,這與文化遺產其嚴肅、真實的特性相違背。在不重新訓練模型的情況下,需要通過增強提示詞文本從而提升生成式模型運用在文化遺產中的效果。
為了解決這個兩個問題,首先本研究提出了一個生成式模型文化遺產互動系統。本研究將三維重建、文字生成、圖片生成和數字化與文化遺產結合。其次,為了解決生成式模型在文化遺產中應用提示詞問題,本研究結合檢索增強生成(Retrieval-Augmented Generation, RAG)自動生成提示詞。RAG只需文化遺產描述文本建立數據庫,隨後根據問題獲取相近的文本片段,即可得到供於豐富生成式模型輸入的提示詞參考,以增強生成結果與文化遺產本身的聯繫。
為了驗證本研究提出的文化遺產生成式模型交互系統的有效性,進行了用戶體驗評估調查,以馬來西亞珠子鞋為實驗對象,在真實文化遺產展覽環境中部署本系統,召集123位參與者進行了使用者體驗調查。結果顯示,與傳統展覽相比,生成式模型交互系統顯著地提升了用戶參與度。其次,為了驗證文化遺產RAG自動提示詞生成效果,進行了定量研究分析,文化遺產自動提示詞算法結果能與文化遺產本身特性強相關,並且用戶更偏向於有此RAG生成關鍵詞結果附加的生成式模型生成結果,並且與只有用戶輸入結果有顯著性差異。
本研究強調了生成式模型技術在文化遺產展示中的潛力,也提出了具體的解決方案來增強這些技術的實用性和相關性。通過整合文字和圖片的生成式模型,並配合精確的自動生成提示詞算法,我們的系統能夠更真實地詮釋文化遺產,同時提供更互動性和教育性的使用者體驗,為數字化遺產的可持續管理和展示提供了新的視角和工具。
Cultural heritage is the physical evidence of the historical, cultural and scientific achievements of a country or nation, and in the digital era, the digital preservation and presentation of cultural heritage is of paramount importance. With the maturity of Generative Models (GM) technology, new ways of presenting and interacting with cultural heritage have emerged. However, Generative Models have been less studied in the field of cultural heritage, so how Generative Models technology can be applied to the field of cultural heritage display and whether the effect can significantly increase user engagement compared to traditional exhibitions has become an issue that needs to be discussed. Secondly, there is a relationship between the output of a generative model and the quality of its input, i.e., the prompt. In generative modelling, the lack of relevant content in the prompts, or the lack of relevant corpus in the model training set, will lead to a large deviation of the generated content from the characteristics of the cultural heritage itself, or even the existence of fabrication, which is contrary to the serious and authentic nature of the cultural heritage. Without retraining the model, it is necessary to improve the effectiveness of the generative model in cultural heritage by enhancing the cue text.
In order to solve these two problems, this study firstly proposes a generative modelling system for cultural heritage interaction. This study combines 3D reconstruction, text generation, image generation and digitisation with cultural heritage. Secondly, in order to solve the problem of generative modelling for the application of cues in cultural heritage, this study combines Retrieval-Augmented Generation (RAG) to automatically generate cues, which requires only the descriptive text of the cultural heritage to build up a database, and then acquires similar text fragments based on the question, so that cues can be obtained for enriching the input of generative modelling. The RAG only requires a database of cultural heritage descriptive texts and then obtains similar text fragments based on the question to obtain cues for enriching the input of the generative model to enhance the connection between the generated results and the cultural heritage itself.
In order to verify the effectiveness of the proposed interactive system for generative modelling of cultural heritage, a user experience evaluation survey was conducted. Using Malaysian beaded shoes as the experimental subject, the system was deployed in a real cultural heritage exhibition environment and 123 participants were convened to conduct a user experience survey. The results showed that the generative modelling interactive system significantly enhanced user engagement compared to traditional exhibitions. Secondly, in order to verify the effectiveness of the cultural heritage RAG auto-cue generation, a quantitative study was conducted to analyse that the results of the cultural heritage auto-cue algorithm can be strongly related to the characteristics of the cultural heritage itself, and that users preferred the results generated by the generative model with the RAG-generated keyword results attached to it, which was significantly different from the results generated by the user input only.
This study highlights the potential of generative modelling techniques in the presentation of cultural heritage and proposes concrete solutions to enhance the usefulness and relevance of these techniques. By integrating textual and pictorial generative models with precise algorithms for automatic prompt generation, our system is able to interpret cultural heritage more authentically while providing a more interactive and educational user experience, offering new perspectives and tools for the sustainable management and presentation of digital heritage.
2024
中文
51
致 謝 IV
摘 要 IV
Abstract V
圖目錄 XI
表目錄 XII
第 1 章 緒論 1
1.1 研究背景和意義 1
1.1.1 研究背景 1
1.1.2 研究意義 3
1.2 主要貢獻 4
1.3 本文章節安排 5
第 2 章 研究現狀 6
2.1 文化遺產數字化 6
2.1.1 數位化技術簡介 6
2.1.2 技術應用 7
2.2 生成式模型 8
2.3 提示詞工程現狀 10
2.3.1 提示詞工程簡介 10
2.3.2 不同領域提示詞工程 · 11
2.3.3 檢索增強生成 12
2.4 本章小結 13
第 3 章 3.1 基於生成式模型的文化遺產交互系統 15
3.2 框架設計 · 15
3.2.1 文化遺產模型 16
3.2.2 故事生成介面 16
3.2.3 圖片生成介面 · 18
3.3 框架前端實現 18
3.3.1 故事生成 20
3.3.2 圖片生成 20
3.4 整體框架 · 22
3.5 本章小結 23
第 4 章 基於 RAG 的自動提示詞演算法 · 24
4.1 概述以及總體流程 24
4.2 構建文化遺產資訊向量數據庫 24
4.2.1 收集文化遺產文字集 24
4.2.2 文本切分 25
4.2.3 文本塊向量化 · 26
4.2.4 向量數據庫和建立索引 26
4.3 檢索與構建提示詞 27
4.3.1 提升故事生成效果 · 28
4.3.2 提升圖片生成效果 28
4.4 本章小結 28
第 5 章 評估方法與結果 30
5.1 文化遺產實驗對象選用 · 30
5.2 基於生成式模型的文化遺產交互系統評估 31
5.2.1 實驗方法 31
5.2.2 實驗結果 · 33
5.3 基於 RAG 的提示詞評估 35
5.3.1 文字描述集來源 35
5.3.2 文本切分方法對比 36
5.3.3 問答系統有效性對比 36
5.3.4 RAG 用於圖片生成用戶評估 · 38
5.3.5 RAG 用於故事生成用戶評估 39
5.4 本章小結 40
第 6 章 總結與展望 42
6.1 總結 42
6.2 未來工作 43
6.2.1 增加問答系統功能 43
6.2.2 本地部署模型 43
6.2.3 優化數據採集過程 43
6.2.4 擴展創作類型 44
6.2.5 應用於虛擬現實和增強現實 44
6.2.6 擴展應用範圍 44
參考文獻 45
作者簡歷 51
[1] COSOVIC M, AMELIO A, JUNUZ E. Classification methods in cultural heritage[C]// VIPERC@IRCDL. 2019.
[2] PISONI G, DíAZ-RODRíGUEZ N, GIJLERS H, et al. Human-centered artificial intelligence for designing accessible cultural heritage[J]. Applied Sciences, 2021, 11(2).
[3] LIRITZIS I, VOLONAKIS P, VOSINAKIS S. 3d reconstruction of cultural heritage sites as an educational approach. the sanctuary of delphi[J]. Applied Sciences, 2021, 11(8).
[4] BEKELE M K, PIERDICCA R, FRONTONI E, et al. A survey of augmented, virtual, and mixed reality for cultural heritage[J]. J. Comput. Cult. Herit., 2018, 11(2).
[5] BOZZELLI G, RAIA A, RICCIARDI S, et al. An integrated vr/ar framework for user-centric interactive experience of cultural heritage: The arkaevision project[J]. Digital Applications in Archaeology and Cultural Heritage, 2019, 15: e00124.
[6] MORTARA M, CATALANO C E, BELLOTTI F, et al. Learning cultural heritage by serious games[J]. Journal of Cultural Heritage, 2014, 15(3): 318-325.
[7] KORDHA E, GORICA K, SEVRANI K. The importance of digitalization for sustainable cul- tural heritage sites in albania[M]. Cham: Springer International Publishing, 2019: 91-97.
[8] GERVASI O, PERRI D, SIMONETTI M, et al. Strategies for the digitalization of cultural
heritage[C]//GERVASI O, MURGANTE B, MISRA S, et al. Computational Science and Its Applications – ICCSA 2022 Workshops. Cham: Springer International Publishing, 2022: 486- 502.
[9] DONGHUI C, GUANFA L, WENSHENG Z, et al. Virtual reality technology applied in digi- talization of cultural heritage[J]. Cluster Computing, 2019, 22(4): 10063-10074.
[10] POULOPOULOS V, WALLACE M. Digital technologies and the role of data in cultural her- itage: The past, the present, and the future[J]. Big Data Cogn. Comput., 2022, 6: 73.
[11] PAVLIDIS G. From digital recording to advanced ai applications in archaeology and cultural heritage[M]. Cham: Springer International Publishing, 2023: 1627-1656.
[12] AMATO F, MOSCATO V, PICARIELLO A, et al. Kira: A system for knowledge-based ac- cess to multimedia art collections[C]//2017 IEEE 11th International Conference on Semantic Computing (ICSC). 2017: 338-343.
[13] ZHAOM,WUX,LIAOHT,etal.Exploringresearchfrontsandtopicsofbigdataandartificial intelligence application for cultural heritage and museum research[J]. IOP Conference Series: Materials Science and Engineering, 2020, 806(1): 012036.
[14] XIE J, LI L. Innovative design of artificial intelligence in intangible cultural heritage[J]. Sci.Program., 2022, 2022.
[15] GIOVANNINI E C, LO TURCO M, TOMALINI A. Digital practices to enhance intangible
cultural heritage[J]. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2021, XLVI-M-1-2021: 273-278.
[16] QIER S A, LIU Y, SHAN W, et al. Landscape digital gene: On the logic of landscape digital-ization——a study of nanxun ancient town, zhejiang province, china[Z]. 2023.
[17] ARISTIDOU A, SHAMIR A, CHRYSANTHOU Y. Digital dance ethnography: Organizing large dance collections[J]. J. Comput. Cult. Herit., 2019, 12(4).
[18] ALLEN P, FEINER S, MESKELL L, et al. Digitally modeling, visualizing and preserving archaeological sites[C]//Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Li- braries, 2004. 2004: 389-.
[19] HAN K, SHIH P C, ROSSON M B, et al. Enhancing community awareness of and participa-tion in local heritage with a mobile application[C]//CSCW ’14: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing. New York, NY, USA: Association for Computing Machinery, 2014: 1144–1155.
[20] O’GORMAN J A. Rehabilitating old archaeology collections with gis[J]. Collections, 2007, 3 (1): 75-102.
[21] SANDERS D H. Reveal: One future for heritage documentation[C]//2013 Digital Heritage International Congress (DigitalHeritage): Vol. 2. 2013: 527-534.
[22] ZHANG Y, JI N, ZHU X, et al. Inheritance and revitalization: Exploring the synergy between aigc technologies and chinese traditional culture[C]//ZHAO F, MIAO D. AI-generated Content. Singapore: Springer Nature Singapore, 2024: 24-32.
[23] SHIMIN P, ANWAR R B, AWANG N N B, et al. Research on yixing zisha teapot design innovation based on aigc technology[J]. International Journal of Innovation, Creativity and Change, 2023, 17(2).
[24] YANG Z, BAI H, LUO Z, et al. Pacanet: A study on cyclegan with transfer learning for diver- sifying fused chinese painting and calligraphy[A]. 2023. arXiv: 2301.13082.
[25] MANARIS B. Natural language processing: A human-computer interaction perspective[J]. Advances in Computers, 1998, 47: 1-66.
[26] JELINEK F. Statistical methods for speech recognition[C]//1997.
[27] ELMAN J L. Finding structure in time[J]. Cognitive Science, 1990, 14(2): 179-211.
[28] HOCHREITERS,SCHMIDHUBERJ.Longshort-termmemory[J].NeuralComput.,1997,9 (8): 1735–1780.
[29] CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation[A]. 2014. arXiv: 1406.1078.
[30] EFROS A A, LEUNG T K. Texture synthesis by non-parametric sampling[J]. Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, 2: 1033-1038 vol.2.
[31] HECKBERT P S. Survey of texture mapping[J]. IEEE Computer Graphics and Applications, 1986, 6(11): 56-67.
[32] KINGMA D P, WELLING M. Auto-encoding variational bayes[A]. 2022. arXiv: 1312.6114.
[33] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[A]. 2023. arXiv:
1706.03762.
[34] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[A]. 2019. arXiv: 1810.04805.
[35] BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[A]. 2020. arXiv: 2005.14165.
[36] OPENAI, ACHIAM J, ADLER S, et al. Gpt-4 technical report[A]. 2024. arXiv: 2303.08774.
[37] BOMMASANIR,HUDSONDA,ADELIE,etal.Ontheopportunitiesandrisksoffoundation models[A]. 2022. arXiv: 2108.07258.
[38] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[A]. 2021. arXiv: 2010.11929.
[39] LIU Z, LIN Y, CAO Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[A]. 2021. arXiv: 2103.14030.
[40] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[A]. 2022. arXiv: 2112.10752.
[41] PODELL D, ENGLISH Z, LACEY K, et al. Sdxl: Improving latent diffusion models for high- resolution image synthesis[A]. 2023. arXiv: 2307.01952.
[42] RAMESHA,DHARIWALP,NICHOLA,etal.Hierarchicaltext-conditionalimagegeneration with clip latents[A]. 2022. arXiv: 2204.06125.
[43] RADFORDA,KIMJW,HALLACYC,etal.Learningtransferablevisualmodelsfromnatural language supervision[A]. 2021. arXiv: 2103.00020.
[44] SUSNJAK T. Chatgpt: The end of online exam integrity?[A]. 2022. arXiv: 2212.09292.
[45] ABDULLIN Y, MOLLA-ALIOD D, OFOGHI B, et al. Synthetic dialogue dataset generation using llm agents[A]. 2024. arXiv: 2401.17461.
[46] NARASIMHANA,RAOKPAV,BVM.Cgems:Ametricmodelforautomaticcodegener- ation using gpt-3[A]. 2021. arXiv: 2108.10168.
[47] PARK J S, O’BRIEN J C, CAI C J, et al. Generative agents: Interactive simulacra of human
behavior[A]. 2023. arXiv: 2304.03442.
[48] AGRAWAL M, HEGSELMANN S, LANG H, et al. Large language models are few-shot clin-
ical information extractors[C]//Proceedings of the 2022 Conference on Empirical Methods inNatural Language Processing. 2022.
[49] CHURCH K W, CHEN Z, MA Y. Emerging trends: A gentle introduction to fine-tuning[J]. Natural Language Engineering, 2021, 27(6): 763–778.
[50] RASMY L, XIANG Y, XIE Z, et al. Med-BERT: Pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction[J]. npj Digital Medicine, 2021, 4(1): 86.
[51] KASNECI E, SESSLER K, KüCHEMANN S, et al. Chatgpt for good? on opportunities and challenges of large language models for education[J]. Learning and Individual Differences, 2023, 103: 102274.
[52] RANE N. Role and challenges of chatgpt and similar generative artificial intelligence in arts and humanities[J]. SSRN Electronic Journal, 2023.
[53] OPPENLAENDER J. A taxonomy of prompt modifiers for text-to-image generation[J]. Be- haviour amp; Information Technology, 2023: 1–14.
[54] REYNOLDSL,MCDONELLK.Promptprogrammingforlargelanguagemodels:Beyondthe few-shot paradigm[A]. 2021. arXiv: 2102.07350.
[55] WHITEJ,FUQ,HAYSS,etal.Apromptpatterncatalogtoenhancepromptengineeringwith
chatgpt[A]. 2023. arXiv: 2302.11382.
[56] LIUP,YUANW,FUJ,etal.Pre-train,prompt,andpredict:Asystematicsurveyofprompting methods in natural language processing[A]. 2021. arXiv: 2107.13586.
[57] WEI J, WANG X, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[A]. 2023. arXiv: 2201.11903.
[58] LUY,BARTOLOM,MOOREA,etal.Fantasticallyorderedpromptsandwheretofindthem:
Overcoming few-shot prompt order sensitivity[A]. 2022. arXiv: 2104.08786.
[59] SPENNEMANN D H R. Chatgpt and the generation of digitally born “knowledge": How does a generative ai language model interpret cultural heritage values?[J]. Knowledge, 2023, 3(3): 480-512.
[60] LIU V, CHILTON L B. Design guidelines for prompt engineering text-to-image generative models[C]//CHI Conference on Human Factors in Computing Systems. 2022.
[61] WANG J, SHI E, YU S, et al. Prompt engineering for healthcare: Methodologies and applica- tions[A]. 2024. arXiv: 2304.14670.
[62] OPPENLAENDERJ,LINDERR,SILVENNOINENJ.Promptingaiart:Aninvestigationinto
the creative skill of prompt engineering[A]. 2023. arXiv: 2303.13534.
[63] SHAHC.Frompromptengineeringtopromptsciencewithhumanintheloop[A].2024.arXiv:2401.04122.
[64] LO L S. The art and science of prompt engineering: A new literacy in the information age[J].Internet Reference Services Quarterly, 2023, 27(4): 203-210.
[65] ZHANG Y, LI Y, CUI L, et al. Siren’s song in the ai ocean: A survey on hallucination in large language models[A]. 2023. arXiv: 2309.01219.
[66] LI H, SU Y, CAI D, et al. A survey on retrieval-augmented text generation: abs/2202.01110[A/OL]. 2022. https://api.semanticscholar.org/CorpusID:246472929.
[67] CHEN W, HU H, CHEN X, et al. Murag: Multimodal retrieval-augmented generator for open question answering over images and text[A]. 2022. arXiv: 2210.02928.
[68] CHENW,HUH,SAHARIAC,etal.Re-imagen:Retrieval-augmentedtext-to-imagegenerator [A]. 2022. arXiv: 2209.14491.
[69] GOYAL A, FRIESEN A L, BANINO A, et al. Retrieval-augmented reinforcement learning: abs/2202.08417[A/OL]. 2022. https://api.semanticscholar.org/CorpusID:246904594.
[70] YORANO,WOLFSONT,RAMO,etal.Makingretrieval-augmentedlanguagemodelsrobust to irrelevant context[A]. 2024. arXiv: 2310.01558.
[71] CHENGX,LUOD,CHENX,etal.Liftyourselfup:Retrieval-augmentedtextgenerationwith self memory[A]. 2023. arXiv: 2305.02437.
[72] LIUX,LEIX,WANGS,etal.Alignbench:Benchmarkingchinesealignmentoflargelanguage models[A]. 2023. arXiv: 2311.18743.
[73] POPAT S K, DESHMUKH P B, METRE V A. Hierarchical document clustering based on cosine similarity measure[C]//2017 1st International Conference on Intelligent Systems and In- formation Management (ICISIM). 2017: 153-159.
[74] OMAIN Z, ABDULLAH D F, KHAN M N A A, et al. Sustainability of baba nyonya tourism heritage culture in malacca[Z]. 2020.
[75] LEESK.Theperanakanbabanyonyaculture:resurgenceordisappearance?[J].Sari(ATMA),2008, 26: 161-170.
[76] AZMI N A, NIZAM A, MOHAMAD D, et al. Beaded shoes: the culture of baba nyonya[C]//SHS Web of Conferences: Vol. 45. EDP Sciences, 2018: 02003.
[77] AHMAD A, FATIMA M, ALI A, et al. Sustaining baba-nyonya cultural heritage products: Malacca as a case study[J]. Int J Innov Creat Chang, 2019.
[78] O'BRIENHL,TOMSEG.Examiningthegeneralizabilityoftheuserengagementscale(ues) in exploratory search[J]. Information Processing & Management, 2013, 49(5): 1092-1107.
[79] O'BRIEN H L, CAIRNS P, HALL M. A practical approach to measuring user engagement with the refined user engagement scale (ues) and new ues short form[J]. International Journal of Human-Computer Studies, 2018, 112: 28-39.
[80] JOSHI A, KALE S, CHANDEL S, et al. Likert scale: Explored and explained[J]. British journal of applied science & technology, 2015, 7(4): 396-403.
[81] KAISER H F. An index of factorial simplicity[J]. Psychometrika, 1974, 39(1): 31-36.