Hybrid Approach to Searching and Processing of Complex Structured Big Data for Building an Integrated Algorithm for Ukraine’s Cultural Heritage Analyzing
DOI:
https://doi.org/10.31649/1997-9266-2025-183-6-127-138Keywords:
cultural heritage, data processing, data analysis, Knowledge Graph, NER, BigData, digital archivesAbstract
The issue of preserving and analyzing Ukraine’s cultural heritage requires the development of the advanced intelligent tools capable of processing complex, multimodal, and heterogeneous data. Traditional methods of information retrieval and analysis often fail to account for the multilingual nature of archives, the presence of handwritten and poorly digitized documents, historical variations in terminology, and the necessity of fact verification, which significantly reduces the effectiveness of data integration from diverse sources. To address these challenges, this study proposes a hybrid approach that combines multilevel web parsing, optical and handwritten text recognition (OCR/HTR), natural language processing (NLP) techniques, mechanisms for detecting duplicates and unreliable facts, and the construction of a knowledge graph employing clustering algorithms, PageRank, Apriori, and ARIMA. A distinctive feature of the proposed system is an adaptive search module enabling automated extraction, structuring, and verification of data, as well as an interactive map with geospatial visualization of cultural heritage figures, implemented using the Leaflet library and OpenStreetMap technologies. The architecture of the system supports multilayer data processing — from normalization, lemmatization, and named entity recognition to semantic analysis, associative search, and predictive modeling of cultural and historical dynamics. Computational experiments confirmed the efficiency and scalability of the approach, demonstrating stable system performance in real-time conditions. The obtained results highlight the potential of the developed model as the foundation for a unified national information and retrieval system for Ukraine’s cultural heritage. The practical value of this hybrid framework extends to museum studies, archival science, education, and digital humanities research, ensuring standardized access to cultural data, enhancing analytical reliability, and fostering the integration of Ukrainian heritage into the global digital ecosystem. Further development of the system may involve the incorporation of multimodal data sources such as 3D models, audio archives, and blockchain-based provenance verification to strengthen data authenticity and long-term digital preservation.
References
S. Barzaghi, A. Moretti, I. Heibi, and S. Peroni, “CHAD-KG: A knowledge graph for representing cultural heritage objects and digitisation paradata,” arXiv preprint, 2025. [Electronic resource]. Available: https://arxiv.org/abs/2505.13276. Accessed: 09-Oct-2025.
M. T. Biagetti, “An ontological model for the integration of cultural heritage information: CIDOC-CRM,” Italian Journal of Library, 2016. [Electronic resource]. Available: https://www.cidoc-crm.org/Resources/an-ontological-model-for-the-integration-of-cultural-heritage-information-cidoc-crm. Accessed: 09-Oct-2025.
H. El-Hajj and M. Valleriani, “Representing and validating cultural heritage knowledge graphs in CIDOC-CRM ontology,” Future Internet, vol. 13, no. 11, p. 277, 2021, https://doi.org/10.3390/fi13110277 .
M. Puren, and P. Vernus, “Towards a domain ontology for the analysis of ancient fabrics: The SILKNOW Project and the case of European silk heritage,” arXiv preprint, 2021. [Electronic resource]. Available: https://arxiv.org/abs/2112.15341. Accessed: 09-Oct-202.
P. Fafalios, A. Kritsotaki, and M. Doerr, “The SeaLiT Ontology — an Extension of CIDOC-CRM for the Modelling and Integration of Maritime History Information,” arXiv preprint, 2023. [Electronic resource]. Available: https://arxiv.org/abs/2301.04493. Accessed: 09-Oct-2025.
Z. Wang, and H. Song, “A fusion model for artwork identification based on convolutional neural networks and transformers,” arXiv preprint, 2025. [Electronic resource]. Available: https://arxiv.org/abs/2502.18083. Accessed: 09-Oct-2025.
T. Fan, H. Wang, and S. Deng, “Intangible cultural heritage image classification with multimodal attention and hierarchical fusion,” Expert Systems with Applications, vol. 231, 2023, https://doi.org/10.1016/j.eswa.2023.120555 .
H. El-Hajj, and M. Valleriani, “CIDOC2VEC: Extracting information from atomized CIDOC-CRM humanities knowledge graphs,” Information, vol. 12, no. 12, p. 503, 2021, https://doi.org/10.3390/info12120503 .
Downloads
-
pdf (Українська)
Downloads: 0
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).