Short-term forecasting of mortality rates based on operational data using machine learning methods
Abstract
The study examined the possibility of creating and comparing short-term predictive mortality models of a region's population in the pre-COVID (2019) and COVID period (2020) using machine learning methods (CatBoost). We used operational data on the number of deaths from the Federal State Statistics Service, as well as additional reference materials of the constituent entities of the Russian Federation (demographic and general geographic data, information about healthcare facilities, health system indicators, medical monitoring, risk indicators, etc.).
For the 2019 data, the model error decreased as the learning period increased, from 13% to 0.5%. In the 2020 data, this decrease was not observed, and the error varied between 8 and 16%. It was not possible to improve the accuracy of forecasts by adding regional characteristics. The aggregated data had the features of a random process and no predictors that had a significant impact on the causes of death or were significantly associated with them were observed.
Downloads
References
Колесников А.С., Сапегин С.В. (2019). Использование технологии машинного обучения Catboost для планирования сервисного обслуживания грузовой спецтехники. В Информатика: проблемы, методология, технологии: сборник материалов XIX международной научно-методической конференции (сс. 1479–1484). Воронеж: Издательство «Научно-исследовательские публикации».
Лифшиц М.Л. (2021). Смертность в России в первый год пандемии covid-19 и потенциальные демографические последствия. В Парадигмы и модели демографического развития : сборник статей XII Уральского демографического форума, Том 1 (сс. 246–253). Екатеринбург: ИЭ УрО РАН.
Ahlburg D.A., Lutz W. (1998). Introduction: The Need to Rethink Approaches to Population Forecasts. Population and Development Review, 24, 1–14. https://doi.org/10.2307/2808048
Bravo J.M. (2021). Forecasting mortality rates with Recurrent Neural Networks: A preliminary investigation using Portuguese data. In CAPSI 2021 Proceedings: 21ª Conferência da Associação Portuguesa de Sistemas de Informação, "Sociedade 5.0: Os desafios e as Oportunidades para os Sistemas de Informação" (pp. 1-19).
Deprez P., Shevchenko P.V., Wüthrich M.V. (2017). Machine learning techniques for mortality modeling. European Actuarial Journal, 7, 337–352. https://doi.org/10.1007/s13385-017-0152-4
Dorogush A.V., Ershov V., Gulin A. (2018). CatBoost: gradient boosting with categorical features support. https://doi.org/10.48550/arXiv.1810.11363
Hainaut D. (2018). A neural-network analyzer for mortality forecast. ASTIN Bulletin: The Journal of the IAA, 48, 481–508. https://doi.org/10.1017/asb.2017.45
Lee R.D., Carter L.R. (1992). Modeling and Forecasting U. S. Mortality. Journal of the American Statistical Association, 87, 659–671. https://doi.org/10.2307/2290201
Levantesi S., Pizzorusso V. (2019). Application of Machine Learning to Mortality Modeling and Forecasting. Risks, 7(1), 26. https://doi.org/10.3390/risks7010026
Perla F., Richman R., Scognamiglio S., Wüthrich M.V. (2021). Time-series forecasting of mortality rates using deep learning. Scandinavian Actuarial Journal, 7, 572–598. https://doi.org/10.1080/03461238.2020.1867232
Richman R., Wüthrich M.V. (2021). A neural network extension of the Lee-Carter model to multiple populations. Annals of Actuarial Science, 15(2), 346–366. https://doi.org/10.1017/S1748499519000071
Rizzi S., Vaupel J.W. (2021). Short-term forecasts of expected deaths. The Proceedings of the National Academy of Sciences, 118(15), 1–7. https://doi.org/10.1073/PNAS.2025324118
Wang H., Paulson K.R., Pease S.A. et al. (2022). Estimating excess mortality due to the COVID-19 pandemic: a systematic analysis of COVID-19-related mortality, 2020–21. Lancet, 399, 1513–1536. https://doi.org/10.1016/S0140-6736(21)02796-3