LLM-based Feature Extraction Technology for Patient Testing from Textual Reports to Enhance Covid-19 Case Forecasting

Authors

  • A. V. Losenko Vinnytsia National Technical University
  • Ye. M. Kryzhanovsky Vinnytsia National Technical University
  • I. M. Shtelmakh Vinnytsia National Technical University
  • I. V. Varchuk Vinnytsia National Technical University

DOI:

https://doi.org/10.31649/1997-9266-2024-177-6-135-144

Keywords:

information technology, feature engineering, time series forecasting,, Prophet, artificial intelligence, large language models, COVID-19

Abstract

The article focuses on the application of modern large language models (LLMs) to automate the extraction of essential features from analytical textual reports on the COVID-19 pandemic in Ukraine during 2020–2022. These reports encompass a broad spectrum of data, including regional morbidity indicators, testing dynamics, vaccination outcomes, and demographic characteristics of patients. The study explores the integration of these extracted features into time series models to improve the accuracy of epidemic forecasts.

Central to the research is the use of the Prophet model, which was enhanced to account for seasonal changes and anomalies in the data. The study addressed challenges such as the multi-wave nature of the COVID-19 time series, incorporating sharp increases and decreases in cases. Adjustments were made for anomalies caused by changes in quarantine measures, testing policies, and vaccination campaigns, particularly during winter surges.

Optimizing the Prophet model involved advanced parameter tuning using methods such as grid search and stochastic optimization, tailored to the specific epidemiological context in Ukraine. Additionally, the study evaluated the potential of neural network models, including LSTM (Long Short-Term Memory), to analyze time series data. LSTM’s ability to capture nonlinear relationships and process multiple input variables complements traditional methods, providing deeper insights into long-term trends and interdependencies in the data.

The goal of this study is to develop an effective forecasting tool that integrates LLM-extracted features with advanced modeling techniques. By combining Prophet with enhancements and neural network approaches like LSTM, the research aims to significantly improve the accuracy of short- and long-term forecasts. This is particularly crucial for timely decision-making in public health during periods of epidemiological uncertainty.

Author Biographies

A. V. Losenko, Vinnytsia National Technical University

PhD, Assistant of the Chair of System Analysis and Information Technologies

Ye. M. Kryzhanovsky, Vinnytsia National Technical University

Cand. Sc. (Eng.), Associate Professor of the Chair of System Analysis and Information Technologies

I. M. Shtelmakh, Vinnytsia National Technical University

Cand. Sc. (Eng.), Assistant of the Chair of System Analysis and Information Technologies

I. V. Varchuk, Vinnytsia National Technical University

Cand. Sc. (Eng.), Associate Professor of the Chair of System Analysis and Information Technologies

References

В. Б. Мокін, А. В. Лосенко, і А. Р. Ящолт, «Інформаційна технологія аналізу та прогнозування кількості нових випадків хвороби на коронавірус SARS-COV-2 в Україні на основі моделі Prophet», Вісник Вінницького політехнічного інституту, № 5, с. 71-83, 2020. https://doi.org/10.31649/1997-9266-2020-152-5-71-83 .

В. Б. Мокін, А. В. Лосенко, і А. Р. Ящолт, «Інформаційна технологія аналізу та прогнозування багатохвильової кількості нових випадків захворювань на коронавірус COVID-19 на основі моделі Prophet», Вісник Вінницького політехнічного інституту, № 6, с. 65-75, 2020. https://doi.org/10.31649/1997-9266-2020-153-6-65-75 .

В. Б. Мокін, М. В. Дратований, А. В. Лосенко, С. О. Жуков, «Прогнозування хвиль коронавірусу на основі відновленої когнітивної карти міжрегіонального впливу,» Інформаційні технології та комп’ютерна інженерія, т. 52, вип. 3, с. 86-94, 2021.

A. Vartholomaios, S. Karlos, E. Kouloumpris, and G. Tsoumakas, “Short-term Renewable Energy Forecasting in Greece using Prophet Decomposition and Tree-based Ensembles,” arXiv, Jul. 2021. [Electronic resource]. Available: https://arxiv.org/abs/2107.03825 . Accessed: 23 Nov. 2024.

dos Santos Junior, J. C. Hu, R. Song, and Y. Bai, “Domain-Driven LLM Development: Insights into RAG and Fine-Tuning Practices,” in Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, August. 2024, pp. 6416-6417. https://doi.org/10.1145/3637528.3671445 .

M. Arslan, S. Munawar, and C. Cruz, “Business insights using RAG–LLMs: a review and case study,” Journal of Decision Systems, pp.1-30, 2024. https://doi.org/10.1080/12460125.2024.2410040 .

Інститут проблем математичних машин і систем НАН України, Звіти робочої групи з математичного моделювання проблем, пов’язаних з епідемією коронавірусу SARS-CoV-2 в Україні, [Електронний ресурс]. Режим доступу: https://old.nas.gov.ua/UA//Activity/covid/Pages/wg.aspx . Дата звернення: 23 листопада. 2024.

Робоча група з математичного моделювання проблем, пов’язаних з епідемією коронавірусу SARS-CoV-2 в Україні, Прогноз розвитку епідемії COVID-19 в Україні на 23 лютого – 8 березня 2022 року («Прогноз РГ-62»). [Електронний ресурс]. Режим доступу: https://old.nas.gov.ua/UA/Messages/Pages/View.aspx?MessageID=8716 . Дата звернення: 23 листопада. 2024.

H. Tang, et al., “Time series forecasting with llms: Understanding and enhancing model capabilities,” arXiv, 2024. [Electronic resource]. Available: https://arxiv.org/abs/2402.10835. Accessed: 23 листопад 2024.

P. Cawood, and T. L. van Zyl, “Feature-weighted Stacking for Nonseasonal Time Series Forecasts: A Case Study of the COVID-19 Epidemic Curves,” arXiv, Aug. 2021. [Electronic resource]. Available: https://arxiv.org/abs/2108.08723. Accessed: 23 Nov. 2024.

B. VanBerlo, M. A. S. Ross, and D. Hsia, “Univariate Long-Term Municipal Water Demand Forecasting,” arXiv, May 2021. [Electronic resource]. Available: https://arxiv.org/abs/2105.08486. Accessed: 23 Nov. 2024.

J. Heaton, “An Empirical Analysis of Feature Engineering for Predictive Modeling,” arXiv, Apr. 2019. [Electronic resource]. Available: https://arxiv.org/abs/1701.07852. Accessed: 23 Nov. 2024.

B. S. Shaw, “False Prophet: Feature Engineering for a Homemade Time Series Regression,” Towards Data Science, Dec. 2020. [Electronic resource]. Available: https://towardsdatascience.com/false-prophet-feature-engineering-for-a-homemade-time-series-regression-1b3f7a1b1c7e. Accessed: 23 Nov. 2024.

H. Xue, and F. D. Salim, “Promptcast: A new prompt-based learning paradigm for time series forecasting,” IEEE Transactions on Knowledge and Data Engineering, 2023. https://doi.org/10.1109/TKDE.2023.3342137 .

B. S. Shaw, “Integrating Feature Engineering and Prophet for Enhanced Time Series Predictions,” Towards Data Science, Nov. 2020. [Electronic resource]. Available: https://towardsdatascience.com/integrating-feature-engineering-and-prophet-for-enhanced-time-series-predictions-cfd62a5d6351. Accessed: 23 Nov. 2024.

Downloads

Abstract views: 5

Published

2024-12-27

How to Cite

[1]
A. V. Losenko, Y. M. Kryzhanovsky, I. M. Shtelmakh, and I. V. Varchuk, “LLM-based Feature Extraction Technology for Patient Testing from Textual Reports to Enhance Covid-19 Case Forecasting”, Вісник ВПІ, no. 6, pp. 135–144, Dec. 2024.

Issue

Section

Information technologies and computer sciences

Metrics

Downloads

Download data is not yet available.

Most read articles by the same author(s)

1 2 > >>