Cultural heritage is the physical evidence of the historical, cultural and scientific achievements of a country or nation, and in the digital era, the digital preservation and presentation of cultural heritage is of paramount importance. With the maturity of Generative Models (GM) technology, new ways of presenting and interacting with cultural heritage have emerged. However, Generative Models have been less studied in the field of cultural heritage, so how Generative Models technology can be applied to the field of cultural heritage display and whether the effect can significantly increase user engagement compared to traditional exhibitions has become an issue that needs to be discussed. Secondly, there is a relationship between the output of a generative model and the quality of its input, i.e., the prompt. In generative modelling, the lack of relevant content in the prompts, or the lack of relevant corpus in the model training set, will lead to a large deviation of the generated content from the characteristics of the cultural heritage itself, or even the existence of fabrication, which is contrary to the serious and authentic nature of the cultural heritage. Without retraining the model, it is necessary to improve the effectiveness of the generative model in cultural heritage by enhancing the cue text.
In order to solve these two problems, this study firstly proposes a generative modelling system for cultural heritage interaction. This study combines 3D reconstruction, text generation, image generation and digitisation with cultural heritage. Secondly, in order to solve the problem of generative modelling for the application of cues in cultural heritage, this study combines Retrieval-Augmented Generation (RAG) to automatically generate cues, which requires only the descriptive text of the cultural heritage to build up a database, and then acquires similar text fragments based on the question, so that cues can be obtained for enriching the input of generative modelling. The RAG only requires a database of cultural heritage descriptive texts and then obtains similar text fragments based on the question to obtain cues for enriching the input of the generative model to enhance the connection between the generated results and the cultural heritage itself.
In order to verify the effectiveness of the proposed interactive system for generative modelling of cultural heritage, a user experience evaluation survey was conducted. Using Malaysian beaded shoes as the experimental subject, the system was deployed in a real cultural heritage exhibition environment and 123 participants were convened to conduct a user experience survey. The results showed that the generative modelling interactive system significantly enhanced user engagement compared to traditional exhibitions. Secondly, in order to verify the effectiveness of the cultural heritage RAG auto-cue generation, a quantitative study was conducted to analyse that the results of the cultural heritage auto-cue algorithm can be strongly related to the characteristics of the cultural heritage itself, and that users preferred the results generated by the generative model with the RAG-generated keyword results attached to it, which was significantly different from the results generated by the user input only.
This study highlights the potential of generative modelling techniques in the presentation of cultural heritage and proposes concrete solutions to enhance the usefulness and relevance of these techniques. By integrating textual and pictorial generative models with precise algorithms for automatic prompt generation, our system is able to interpret cultural heritage more authentically while providing a more interactive and educational user experience, offering new perspectives and tools for the sustainable management and presentation of digital heritage.
[1] COSOVIC M, AMELIO A, JUNUZ E. Classification methods in cultural heritage[C]// VIPERC@IRCDL. 2019.
[2] PISONI G, DíAZ-RODRíGUEZ N, GIJLERS H, et al. Human-centered artificial intelligence for designing accessible cultural heritage[J]. Applied Sciences, 2021, 11(2).
[3] LIRITZIS I, VOLONAKIS P, VOSINAKIS S. 3d reconstruction of cultural heritage sites as an educational approach. the sanctuary of delphi[J]. Applied Sciences, 2021, 11(8).
[4] BEKELE M K, PIERDICCA R, FRONTONI E, et al. A survey of augmented, virtual, and mixed reality for cultural heritage[J]. J. Comput. Cult. Herit., 2018, 11(2).
[5] BOZZELLI G, RAIA A, RICCIARDI S, et al. An integrated vr/ar framework for user-centric interactive experience of cultural heritage: The arkaevision project[J]. Digital Applications in Archaeology and Cultural Heritage, 2019, 15: e00124.
[6] MORTARA M, CATALANO C E, BELLOTTI F, et al. Learning cultural heritage by serious games[J]. Journal of Cultural Heritage, 2014, 15(3): 318-325.
[7] KORDHA E, GORICA K, SEVRANI K. The importance of digitalization for sustainable cul- tural heritage sites in albania[M]. Cham: Springer International Publishing, 2019: 91-97.
[8] GERVASI O, PERRI D, SIMONETTI M, et al. Strategies for the digitalization of cultural
heritage[C]//GERVASI O, MURGANTE B, MISRA S, et al. Computational Science and Its Applications – ICCSA 2022 Workshops. Cham: Springer International Publishing, 2022: 486- 502.
[9] DONGHUI C, GUANFA L, WENSHENG Z, et al. Virtual reality technology applied in digi- talization of cultural heritage[J]. Cluster Computing, 2019, 22(4): 10063-10074.
[10] POULOPOULOS V, WALLACE M. Digital technologies and the role of data in cultural her- itage: The past, the present, and the future[J]. Big Data Cogn. Comput., 2022, 6: 73.
[11] PAVLIDIS G. From digital recording to advanced ai applications in archaeology and cultural heritage[M]. Cham: Springer International Publishing, 2023: 1627-1656.
[12] AMATO F, MOSCATO V, PICARIELLO A, et al. Kira: A system for knowledge-based ac- cess to multimedia art collections[C]//2017 IEEE 11th International Conference on Semantic Computing (ICSC). 2017: 338-343.
[13] ZHAOM,WUX,LIAOHT,etal.Exploringresearchfrontsandtopicsofbigdataandartificial intelligence application for cultural heritage and museum research[J]. IOP Conference Series: Materials Science and Engineering, 2020, 806(1): 012036.
[14] XIE J, LI L. Innovative design of artificial intelligence in intangible cultural heritage[J]. Sci.Program., 2022, 2022.
[15] GIOVANNINI E C, LO TURCO M, TOMALINI A. Digital practices to enhance intangible
cultural heritage[J]. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, 2021, XLVI-M-1-2021: 273-278.
[16] QIER S A, LIU Y, SHAN W, et al. Landscape digital gene: On the logic of landscape digital-ization——a study of nanxun ancient town, zhejiang province, china[Z]. 2023.
[17] ARISTIDOU A, SHAMIR A, CHRYSANTHOU Y. Digital dance ethnography: Organizing large dance collections[J]. J. Comput. Cult. Herit., 2019, 12(4).
[18] ALLEN P, FEINER S, MESKELL L, et al. Digitally modeling, visualizing and preserving archaeological sites[C]//Proceedings of the 2004 Joint ACM/IEEE Conference on Digital Li- braries, 2004. 2004: 389-.
[19] HAN K, SHIH P C, ROSSON M B, et al. Enhancing community awareness of and participa-tion in local heritage with a mobile application[C]//CSCW ’14: Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing. New York, NY, USA: Association for Computing Machinery, 2014: 1144–1155.
[20] O’GORMAN J A. Rehabilitating old archaeology collections with gis[J]. Collections, 2007, 3 (1): 75-102.
[21] SANDERS D H. Reveal: One future for heritage documentation[C]//2013 Digital Heritage International Congress (DigitalHeritage): Vol. 2. 2013: 527-534.
[22] ZHANG Y, JI N, ZHU X, et al. Inheritance and revitalization: Exploring the synergy between aigc technologies and chinese traditional culture[C]//ZHAO F, MIAO D. AI-generated Content. Singapore: Springer Nature Singapore, 2024: 24-32.
[23] SHIMIN P, ANWAR R B, AWANG N N B, et al. Research on yixing zisha teapot design innovation based on aigc technology[J]. International Journal of Innovation, Creativity and Change, 2023, 17(2).
[24] YANG Z, BAI H, LUO Z, et al. Pacanet: A study on cyclegan with transfer learning for diver- sifying fused chinese painting and calligraphy[A]. 2023. arXiv: 2301.13082.
[25] MANARIS B. Natural language processing: A human-computer interaction perspective[J]. Advances in Computers, 1998, 47: 1-66.
[26] JELINEK F. Statistical methods for speech recognition[C]//1997.
[27] ELMAN J L. Finding structure in time[J]. Cognitive Science, 1990, 14(2): 179-211.
[28] HOCHREITERS,SCHMIDHUBERJ.Longshort-termmemory[J].NeuralComput.,1997,9 (8): 1735–1780.
[29] CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation[A]. 2014. arXiv: 1406.1078.
[30] EFROS A A, LEUNG T K. Texture synthesis by non-parametric sampling[J]. Proceedings of the Seventh IEEE International Conference on Computer Vision, 1999, 2: 1033-1038 vol.2.
[31] HECKBERT P S. Survey of texture mapping[J]. IEEE Computer Graphics and Applications, 1986, 6(11): 56-67.
[32] KINGMA D P, WELLING M. Auto-encoding variational bayes[A]. 2022. arXiv: 1312.6114.
[33] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[A]. 2023. arXiv:
1706.03762.
[34] DEVLIN J, CHANG M W, LEE K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[A]. 2019. arXiv: 1810.04805.
[35] BROWN T B, MANN B, RYDER N, et al. Language models are few-shot learners[A]. 2020. arXiv: 2005.14165.
[36] OPENAI, ACHIAM J, ADLER S, et al. Gpt-4 technical report[A]. 2024. arXiv: 2303.08774.
[37] BOMMASANIR,HUDSONDA,ADELIE,etal.Ontheopportunitiesandrisksoffoundation models[A]. 2022. arXiv: 2108.07258.
[38] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[A]. 2021. arXiv: 2010.11929.
[39] LIU Z, LIN Y, CAO Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows[A]. 2021. arXiv: 2103.14030.
[40] ROMBACH R, BLATTMANN A, LORENZ D, et al. High-resolution image synthesis with latent diffusion models[A]. 2022. arXiv: 2112.10752.
[41] PODELL D, ENGLISH Z, LACEY K, et al. Sdxl: Improving latent diffusion models for high- resolution image synthesis[A]. 2023. arXiv: 2307.01952.
[42] RAMESHA,DHARIWALP,NICHOLA,etal.Hierarchicaltext-conditionalimagegeneration with clip latents[A]. 2022. arXiv: 2204.06125.
[43] RADFORDA,KIMJW,HALLACYC,etal.Learningtransferablevisualmodelsfromnatural language supervision[A]. 2021. arXiv: 2103.00020.
[44] SUSNJAK T. Chatgpt: The end of online exam integrity?[A]. 2022. arXiv: 2212.09292.
[45] ABDULLIN Y, MOLLA-ALIOD D, OFOGHI B, et al. Synthetic dialogue dataset generation using llm agents[A]. 2024. arXiv: 2401.17461.
[46] NARASIMHANA,RAOKPAV,BVM.Cgems:Ametricmodelforautomaticcodegener- ation using gpt-3[A]. 2021. arXiv: 2108.10168.
[47] PARK J S, O’BRIEN J C, CAI C J, et al. Generative agents: Interactive simulacra of human
behavior[A]. 2023. arXiv: 2304.03442.
[48] AGRAWAL M, HEGSELMANN S, LANG H, et al. Large language models are few-shot clin-
ical information extractors[C]//Proceedings of the 2022 Conference on Empirical Methods inNatural Language Processing. 2022.
[49] CHURCH K W, CHEN Z, MA Y. Emerging trends: A gentle introduction to fine-tuning[J]. Natural Language Engineering, 2021, 27(6): 763–778.
[50] RASMY L, XIANG Y, XIE Z, et al. Med-BERT: Pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction[J]. npj Digital Medicine, 2021, 4(1): 86.
[51] KASNECI E, SESSLER K, KüCHEMANN S, et al. Chatgpt for good? on opportunities and challenges of large language models for education[J]. Learning and Individual Differences, 2023, 103: 102274.
[52] RANE N. Role and challenges of chatgpt and similar generative artificial intelligence in arts and humanities[J]. SSRN Electronic Journal, 2023.
[53] OPPENLAENDER J. A taxonomy of prompt modifiers for text-to-image generation[J]. Be- haviour amp; Information Technology, 2023: 1–14.
[54] REYNOLDSL,MCDONELLK.Promptprogrammingforlargelanguagemodels:Beyondthe few-shot paradigm[A]. 2021. arXiv: 2102.07350.
[55] WHITEJ,FUQ,HAYSS,etal.Apromptpatterncatalogtoenhancepromptengineeringwith
chatgpt[A]. 2023. arXiv: 2302.11382.
[56] LIUP,YUANW,FUJ,etal.Pre-train,prompt,andpredict:Asystematicsurveyofprompting methods in natural language processing[A]. 2021. arXiv: 2107.13586.
[57] WEI J, WANG X, SCHUURMANS D, et al. Chain-of-thought prompting elicits reasoning in large language models[A]. 2023. arXiv: 2201.11903.
[58] LUY,BARTOLOM,MOOREA,etal.Fantasticallyorderedpromptsandwheretofindthem:
Overcoming few-shot prompt order sensitivity[A]. 2022. arXiv: 2104.08786.
[59] SPENNEMANN D H R. Chatgpt and the generation of digitally born “knowledge": How does a generative ai language model interpret cultural heritage values?[J]. Knowledge, 2023, 3(3): 480-512.
[60] LIU V, CHILTON L B. Design guidelines for prompt engineering text-to-image generative models[C]//CHI Conference on Human Factors in Computing Systems. 2022.
[61] WANG J, SHI E, YU S, et al. Prompt engineering for healthcare: Methodologies and applica- tions[A]. 2024. arXiv: 2304.14670.
[62] OPPENLAENDERJ,LINDERR,SILVENNOINENJ.Promptingaiart:Aninvestigationinto
the creative skill of prompt engineering[A]. 2023. arXiv: 2303.13534.
[63] SHAHC.Frompromptengineeringtopromptsciencewithhumanintheloop[A].2024.arXiv:2401.04122.
[64] LO L S. The art and science of prompt engineering: A new literacy in the information age[J].Internet Reference Services Quarterly, 2023, 27(4): 203-210.
[65] ZHANG Y, LI Y, CUI L, et al. Siren’s song in the ai ocean: A survey on hallucination in large language models[A]. 2023. arXiv: 2309.01219.
[66] LI H, SU Y, CAI D, et al. A survey on retrieval-augmented text generation: abs/2202.01110[A/OL]. 2022. https://api.semanticscholar.org/CorpusID:246472929.
[67] CHEN W, HU H, CHEN X, et al. Murag: Multimodal retrieval-augmented generator for open question answering over images and text[A]. 2022. arXiv: 2210.02928.
[68] CHENW,HUH,SAHARIAC,etal.Re-imagen:Retrieval-augmentedtext-to-imagegenerator [A]. 2022. arXiv: 2209.14491.
[69] GOYAL A, FRIESEN A L, BANINO A, et al. Retrieval-augmented reinforcement learning: abs/2202.08417[A/OL]. 2022. https://api.semanticscholar.org/CorpusID:246904594.
[70] YORANO,WOLFSONT,RAMO,etal.Makingretrieval-augmentedlanguagemodelsrobust to irrelevant context[A]. 2024. arXiv: 2310.01558.
[71] CHENGX,LUOD,CHENX,etal.Liftyourselfup:Retrieval-augmentedtextgenerationwith self memory[A]. 2023. arXiv: 2305.02437.
[72] LIUX,LEIX,WANGS,etal.Alignbench:Benchmarkingchinesealignmentoflargelanguage models[A]. 2023. arXiv: 2311.18743.
[73] POPAT S K, DESHMUKH P B, METRE V A. Hierarchical document clustering based on cosine similarity measure[C]//2017 1st International Conference on Intelligent Systems and In- formation Management (ICISIM). 2017: 153-159.
[74] OMAIN Z, ABDULLAH D F, KHAN M N A A, et al. Sustainability of baba nyonya tourism heritage culture in malacca[Z]. 2020.
[75] LEESK.Theperanakanbabanyonyaculture:resurgenceordisappearance?[J].Sari(ATMA),2008, 26: 161-170.
[76] AZMI N A, NIZAM A, MOHAMAD D, et al. Beaded shoes: the culture of baba nyonya[C]//SHS Web of Conferences: Vol. 45. EDP Sciences, 2018: 02003.
[77] AHMAD A, FATIMA M, ALI A, et al. Sustaining baba-nyonya cultural heritage products: Malacca as a case study[J]. Int J Innov Creat Chang, 2019.
[78] O'BRIENHL,TOMSEG.Examiningthegeneralizabilityoftheuserengagementscale(ues) in exploratory search[J]. Information Processing & Management, 2013, 49(5): 1092-1107.
[79] O'BRIEN H L, CAIRNS P, HALL M. A practical approach to measuring user engagement with the refined user engagement scale (ues) and new ues short form[J]. International Journal of Human-Computer Studies, 2018, 112: 28-39.
[80] JOSHI A, KALE S, CHANDEL S, et al. Likert scale: Explored and explained[J]. British journal of applied science & technology, 2015, 7(4): 396-403.
[81] KAISER H F. An index of factorial simplicity[J]. Psychometrika, 1974, 39(1): 31-36.