LLM-based Feature Extraction Technology for Patient Testing from Textual Reports to Enhance Covid-19 Case Forecasting
DOI:
https://doi.org/10.31649/1997-9266-2024-177-6-135-144Keywords:
information technology, feature engineering, time series forecasting,, Prophet, artificial intelligence, large language models, COVID-19Abstract
The article focuses on the application of modern large language models (LLMs) to automate the extraction of essential features from analytical textual reports on the COVID-19 pandemic in Ukraine during 2020–2022. These reports encompass a broad spectrum of data, including regional morbidity indicators, testing dynamics, vaccination outcomes, and demographic characteristics of patients. The study explores the integration of these extracted features into time series models to improve the accuracy of epidemic forecasts.
Central to the research is the use of the Prophet model, which was enhanced to account for seasonal changes and anomalies in the data. The study addressed challenges such as the multi-wave nature of the COVID-19 time series, incorporating sharp increases and decreases in cases. Adjustments were made for anomalies caused by changes in quarantine measures, testing policies, and vaccination campaigns, particularly during winter surges.
Optimizing the Prophet model involved advanced parameter tuning using methods such as grid search and stochastic optimization, tailored to the specific epidemiological context in Ukraine. Additionally, the study evaluated the potential of neural network models, including LSTM (Long Short-Term Memory), to analyze time series data. LSTM’s ability to capture nonlinear relationships and process multiple input variables complements traditional methods, providing deeper insights into long-term trends and interdependencies in the data.
The goal of this study is to develop an effective forecasting tool that integrates LLM-extracted features with advanced modeling techniques. By combining Prophet with enhancements and neural network approaches like LSTM, the research aims to significantly improve the accuracy of short- and long-term forecasts. This is particularly crucial for timely decision-making in public health during periods of epidemiological uncertainty.
References
В. Б. Мокін, А. В. Лосенко, і А. Р. Ящолт, «Інформаційна технологія аналізу та прогнозування кількості нових випадків хвороби на коронавірус SARS-COV-2 в Україні на основі моделі Prophet», Вісник Вінницького політехнічного інституту, № 5, с. 71-83, 2020. https://doi.org/10.31649/1997-9266-2020-152-5-71-83 .
В. Б. Мокін, А. В. Лосенко, і А. Р. Ящолт, «Інформаційна технологія аналізу та прогнозування багатохвильової кількості нових випадків захворювань на коронавірус COVID-19 на основі моделі Prophet», Вісник Вінницького політехнічного інституту, № 6, с. 65-75, 2020. https://doi.org/10.31649/1997-9266-2020-153-6-65-75 .
В. Б. Мокін, М. В. Дратований, А. В. Лосенко, С. О. Жуков, «Прогнозування хвиль коронавірусу на основі відновленої когнітивної карти міжрегіонального впливу,» Інформаційні технології та комп’ютерна інженерія, т. 52, вип. 3, с. 86-94, 2021.
A. Vartholomaios, S. Karlos, E. Kouloumpris, and G. Tsoumakas, “Short-term Renewable Energy Forecasting in Greece using Prophet Decomposition and Tree-based Ensembles,” arXiv, Jul. 2021. [Electronic resource]. Available: https://arxiv.org/abs/2107.03825 . Accessed: 23 Nov. 2024.
dos Santos Junior, J. C. Hu, R. Song, and Y. Bai, “Domain-Driven LLM Development: Insights into RAG and Fine-Tuning Practices,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August. 2024, pp. 6416-6417. https://doi.org/10.1145/3637528.3671445 .
M. Arslan, S. Munawar, and C. Cruz, “Business insights using RAG–LLMs: a review and case study,” Journal of Decision Systems, pp.1-30, 2024. https://doi.org/10.1080/12460125.2024.2410040 .
Інститут проблем математичних машин і систем НАН України, Звіти робочої групи з математичного моделювання проблем, пов’язаних з епідемією коронавірусу SARS-CoV-2 в Україні, [Електронний ресурс]. Режим доступу: https://old.nas.gov.ua/UA//Activity/covid/Pages/wg.aspx . Дата звернення: 23 листопада. 2024.
Робоча група з математичного моделювання проблем, пов’язаних з епідемією коронавірусу SARS-CoV-2 в Україні, Прогноз розвитку епідемії COVID-19 в Україні на 23 лютого – 8 березня 2022 року («Прогноз РГ-62»). [Електронний ресурс]. Режим доступу: https://old.nas.gov.ua/UA/Messages/Pages/View.aspx?MessageID=8716 . Дата звернення: 23 листопада. 2024.
H. Tang, et al., “Time series forecasting with llms: Understanding and enhancing model capabilities,” arXiv, 2024. [Electronic resource]. Available: https://arxiv.org/abs/2402.10835. Accessed: 23 листопад 2024.
P. Cawood, and T. L. van Zyl, “Feature-weighted Stacking for Nonseasonal Time Series Forecasts: A Case Study of the COVID-19 Epidemic Curves,” arXiv, Aug. 2021. [Electronic resource]. Available: https://arxiv.org/abs/2108.08723. Accessed: 23 Nov. 2024.
B. VanBerlo, M. A. S. Ross, and D. Hsia, “Univariate Long-Term Municipal Water Demand Forecasting,” arXiv, May 2021. [Electronic resource]. Available: https://arxiv.org/abs/2105.08486. Accessed: 23 Nov. 2024.
J. Heaton, “An Empirical Analysis of Feature Engineering for Predictive Modeling,” arXiv, Apr. 2019. [Electronic resource]. Available: https://arxiv.org/abs/1701.07852. Accessed: 23 Nov. 2024.
B. S. Shaw, “False Prophet: Feature Engineering for a Homemade Time Series Regression,” Towards Data Science, Dec. 2020. [Electronic resource]. Available: https://towardsdatascience.com/false-prophet-feature-engineering-for-a-homemade-time-series-regression-1b3f7a1b1c7e. Accessed: 23 Nov. 2024.
H. Xue, and F. D. Salim, “Promptcast: A new prompt-based learning paradigm for time series forecasting,” IEEE Transactions on Knowledge and Data Engineering, 2023. https://doi.org/10.1109/TKDE.2023.3342137 .
B. S. Shaw, “Integrating Feature Engineering and Prophet for Enhanced Time Series Predictions,” Towards Data Science, Nov. 2020. [Electronic resource]. Available: https://towardsdatascience.com/integrating-feature-engineering-and-prophet-for-enhanced-time-series-predictions-cfd62a5d6351. Accessed: 23 Nov. 2024.
Downloads
-
pdf (Українська)
Downloads: 1
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).