Using Google Trends for external migration prediction

  • Georgy Bronitsky HSE University
  • Elena Vakulenko HSE University
Keywords: external migration, migration statistics, Russia, Rosstat, Google Trends, search queries, nowcasting, SARIMA, big data

Abstract

International migration statistics are published with a delay of up to several years. This prevents researchers from making timely analyses of migration flows. The article reviews a method for forecasting international migration flows based on search queries on the Internet using the example of flows from Russia to Germany during 2011-2020. Rosstat, German and OECD data were used to analyze migration. The approach proposed in the paper makes it possible to solve this problem by obtaining an estimate of migration trends with virtually no time delay. Moreover, in some cases it is possible to predict migration events before the actual relocation, which can also be used to evaluate other statistical indicators. To construct the necessary estimates, we employ methods for increasing the data frequency, making it possible to obtain monthly forecasts.

NLP approaches were used to obtain many search queries on migration topics. As a result, the parameters of a linear regression based on Google Trends search query data were evaluated, which made it possible to make a forecast of migration statistics before the publication of Rosstat statistics. The proposed models, in contrast to the model of seasonal autoregressive integrated moving averages (SARIMA), make it possible to take into account structural shifts and shocks in current processes reflected in Internet search queries, providing the opportunity to obtain short-term migration forecasts in real time (nowcasting). The described methods can be used both in the study of other pairs of countries and for the evaluation of other statistical data.

Downloads

Download data is not yet available.

References

Денисенко М.Б. (2003). Эмиграция из России по данным зарубежной статистики. Мир России: Социология, этнология, 12(3), 157-169.

Росстат (2022). База данных. Федеральная служба государственной статистики, показатель «Число выбывших». https://www.fedstat.ru/

Чудиновских О.С. (2010). Современное состояние статистики миграции в России: новые возможности и нерешенные проблемы. Вопросы статистики, 6, 8-16.

Чудиновских О.С. (2016). Административная статистика международной миграции: источники, проблемы и ситуация в России. Вопросы статистики, 2, 32-46.

Чудиновских О.С. (2018). Большие данные и статистика миграции. Вопросы статистики, 25(2), 48-56.

Чудиновских О.С. (2020). К вопросу о статистическом обеспечении исследований миграции и миграционной политики в России. Управление миграцией и модели миграционной политики: возможности и риски, 68-90.

Чудиновских О.С., Донец Е.В. (2018). О новых технологиях и статистике миграции в России. Вопросы статистики, 25(5), 11-26.

Чудиновских О.С., Степанова А.В. (2020). О качестве федерального статистического наблюдения за миграционными процессами. Демографическое Обозрение, 7(1), 54-82. https://doi.org/10.17323/demreview.v7i1.10820

Akaike H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723. https://doi.org/10.1109/TAC.1974.1100705

Bazhenov T., Fantazzini D. (2019). Forecasting Realized Volatility of Russian stocks using Google Trends and Implied Volatility. Russian Journal of Industrial Economics, 12(1), 79-88. https://doi.org/10.17073/2072-1633-2019-1-79-88

Bengtsson L., Lu X., Thorson A., Garfield R., von Schreeb J. (2011). Improved Response to Disasters and Outbreaks by Tracking Population Movements with Mobile Phone Network Data: A Post-Earthquake Geospatial Study in Haiti. PLoS Medicine, 8(8). https://doi.org/10.1371/journal.pmed.1001083

Böhme M. H., Gröger A., Stoehr T. (2020). Searching for a better life: Predicting international migration with online search keywords. Journal of Development Economics, 142.

Celbiş M. G. (2022). Unemployment in Rural Europe: A Machine Learning Perspective. Applied Spatial Analysis and Policy. https://doi.org/10.1007/s12061-022-09464-0

Fantazzini D. (2014). Nowcasting and Forecasting the Monthly Food Stamps Data in the US Using Online Search Data. PloS One, 9. https://doi.org/10.1371/journal.pone.0111894

Fantazzini D., Pushchelenko J., Mironenkov A., Kurbatskii, A. (2021). Forecasting Internal Migration in Russia Using Google Trends: Evidence from Moscow and Saint Petersburg. Forecasting, 3(4), 774-803. https://doi.org/10.3390/forecast3040048

Chi G., State B., Blumenstock J.E., Adamic L. (2020). Who Ties the World Together? Evidence from a Large Online Social Network. In Cherifi H., Gaito S., Mendes J., Moro E., Rocha L. (Eds.), Complex Networks and Their Applications VIII. COMPLEX NETWORKS 2019. Studies in Computational Intelligence, 882. https://doi.org/10.1007/978-3-030-36683-4_37

Eurostat (2020). Database of statistical office of the European Union. Immigration Database. https://ec.europa.eu/eurostat/web/main/data/database

Ginsberg J., Mohebbi M., Patel R., Brammer L., Smolinski M., Brilliant L. (2008). Detecting Influenza Epidemics Using Search Engine Query Data. Nature, 457, 1012–1014. https://doi.org/10.1038/nature07634

Goel S., Hofman J., Lahaie S., Pennock D., Watts D. (2010). Predicting Consumer Behavior with Web Search. Proceedings of the National Academy of Sciences of the United States of America, 107, 17486-17490. https://doi.org/10.1073/pnas.1005962107

Google Trends Index (2022). Database. Explore what the world is searching for by entering a keyword or a topic. www.google.com/trends/

Hauzenberger N., Huber F., Klieber K. (2022). Real-time inflation forecasting using non-linear dimension reduction techniques. International Journal of Forecasting. https://doi.org/10.1016/j.ijforecast.2022.03.002

Kim J., Sîrbu A., Giannotti F., Gabrielli L. (2020). Digital Footprints of International Migration on Twitter. In Berthold, M., Feelders, A., Krempl, G. (Eds.), Advances in Intelligent Data Analysis XVIII (pp. 274-286). Konstanz: Springer. https://doi.org/10.1007/978-3-030-44584-3_22

Kikas R., Dumas M., Saabas A. (2015). Explaining International Migration in the Skype Network. 17-22. https://doi.org/10.1145/2806655.2806658

Moise I., Zurich E., Gaere E., Merz R., Pournaras E. (2016). Tracking language mobility in the Twitter landscape. In 2016 IEEE 16th International Conference on Data Mining Workshops (pp. 663-670). Konstanz: Springer. https://doi.org/10.1109/ICDMW.2016.0099

Mullainathan S., Spiess J. (2017). Machine Learning: An Applied Econometric Approach. Journal of Economic Perspectives, 31(2), 87-106. https://doi.org/10.1257/jep.31.2.87

NLPL (2022). Database. NLP models trained with stated hyperparametes.: http://vectors.nlpl.eu/repository/#

OECD (2020). Database on Immigrants in OECD Countries (DIOC). International Migration Database. https://stats.oecd.org/Index.aspx?DataSetCode=MIG#

Radinsky K., Davidovich S., Markovitch S. (2008). Predicting the News of Tomorrow Using Patterns in Web Search Queries. In 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (pp. 363-367). Sydney: IEEE. https://doi.org/10.1109/WIIAT.2008.215

State B., Rodriguez M., Helbing D., Zagheni E. (2014). Migration of professionals to the U.S.: Evidence from linkedin data. In Aiello L.M., McFarland D. (Eds.), 6th international conference on social informatics (pp. 531–543). Barcelona: Springer. https://doi.org/10.1007/978-3-319-13734-6_37

Subbotin A., Aref S. (2021). Brain drain and brain gain in Russia: Analyzing international migration of researchers by discipline using Scopus bibliometric data 1996–2020. Scientometrics, 126(9), 7875-7900. https://doi.org/10.1007/s11192-021-04091-x

Tjaden J. (2021). Measuring migration 2.0: a review of digital data sources. CMS 9, 59. https://doi.org/10.1186/s40878-021-00273-x

Varian H., Choi H. (2009). Predicting the Present with Google Trends. Economic Record, 88. https://doi.org/10.2139/ssrn.1659302

Varian H.R. (2014). Big Data: New Tricks for Econometrics. Journal of Economic Perspectives, 28(2), 3–28. https://doi.org/10.1257/jep.28.2.3

Wanner P. (2021). How well can we estimate immigration trends using Google data? Qual Quant, 55, 1181-1202. https://doi.org/10.1007/s11135-020-01047-w

Wu L., Brynjolfsson E. (2013). The Future of Prediction: How Google Searches Foreshadow Housing Prices and Sales. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2022293

Zagheni E., Garimella V., Weber I., State B. (2014). Inferring international and internal migration patterns from Twitter data. In Proceedings of the 23rd International Conference on World Wide Web (pp. 439-444). New York: Association for Computing Machinery https://doi.org/10.1145/2567948.2576930

Zagheni E., Weber I. (2012). You are where you E-mail: Using E-mail data to estimate international migration rates. Proceedings of the 3rd Annual ACM Web Science Conference (pp.348-351). New York: Association for Computing Machinery. https://doi.org/10.1145/2380718.2380764

Published
2022-11-01
How to Cite
Bronitsky G., & Vakulenko E. (2022). Using Google Trends for external migration prediction. Demographic Review, 9(3), 75-92. https://doi.org/10.17323/demreview.v9i3.16471
Section
Original papers