Data is an extraordinary driver of innovation. It can be used simultaneously by many, generate broad societal benefits and increase well-being. There are many examples of how open data creates benefits for society as a whole and the economy. One of the best recent examples we could see is how data enabled development of COVID diagnostic tests and treatment options, when the sequenced genome of the new coronavirus SARS-CoV-2 was published just three days after the World Health Organization announced the discovery of the virus.
UNDP in Serbia wanted to test how alternative data sources, closer to real time, can be used as an addition to official sources, to gather information and gain new insights into one of the biggest development challenges - depopulation.
The term depopulation stands for the process of continuous decline of population and its ageing, which has multiple and complex socio-economic implications. The societal challenges associated with this process multiply with time. Demographic indicators of the highest quality are necessary in order to respond effectively.
Official statistics produce regular annual estimates of demographic indicators, still relying on the traditional census, which is conducted only once every ten years, and significantly underestimates the volume of emigration due to methodological limitations. In modern conditions of dramatically increased mobility of the population, both global and local, the reliability of such estimates decreases significantly with the departure from the census year.
In order to obtain more complete estimates of population movements, UNDP Accelerator Lab in Serbia tested alternative data sources. We used LinkedIn data to identify skills and professions that Serbia is losing, as well as to map destination countries, and Google search data to map the diaspora.
Recognizing the potential of alternative data sources and with a goal to mobilise collective intelligence, the United Nations Development Program (UNDP) in partnership with the UN Population Fund (UNFPA), supported by the German Development Agency (GIZ), organised the Depopulation Data Challenge. We addressed the call to the academic community, the private and public sector companies, the tech community and all those interested in this topic.
Many heads are better than one
Collective intelligence is the enhanced capacity that is created when people work together to mobilise a wider range of information, ideas and insights to solve problems and get smarter together. It’s something people have been doing since the beginning of time. What has changed is how collective intelligence has accelerated. Digital technologies enable us to mobilise collective intelligence at a much greater scale and in entirely new ways. We have new data sources at our disposal, such as satellite data, mobile data, social media data, and we can use artificial intelligence (AI) to expand our own intelligence.
We invited teams to collect and combine traditional and alternative data sets, and use artificial intelligence methods, to gain new insights and new knowledge about depopulation in Serbia, counting on that synthesis of knowledge that people have, and the use of digital technologies.
We received a total of 50 applications. In the final phase, which included a detailed elaboration of the initial idea, we evaluated 11 multidisciplinary teams. The four winners were chosen by a jury consisting of representatives of the Cabinet of the Minister without Portfolio in charge of Demography and Population Policies, the Centre for Demographic Research of the Institute of Social Sciences, the Statistical Office of the Republic of Serbia, UNFPA and UNDP.
How many people leave Serbia and where do they go
International migration, as the weakest link in official demographic statistics, was the central theme of the two winning solutions. Their focus was on macro data, i.e. indicators at the state level. The Bootstrappers team focused on the population that emigrated from Serbia, and the InfostudData team focused on potential emigrants.
The Bootstrappers team consisted of three researchers from Harvard University (USA) - experts in the field of IT and data science. The leader of the team was a young scientist from Serbia. Despite the methodological inconsistencies of the type of data they used with official statistics in terms of definitions and coverage of migrants, their solution improved knowledge about recent emigration from Serbia through two dimensions.
Using an analysis of relatively easily accessible data on Facebook users, the team provided a far more realistic estimate of the number of Serbian citizens abroad than the census. In relation to about 313 thousand emigrants according to the 2011 census (Stanković 2014), this method indicated that there were as many as 860 thousand of them at the end of 2020, which is much closer to the estimate of the United Nations Department of Economic and Social Affairs, Population Division (2020) of about one million emigrants from Serbia.
The list of 82 destinations includes those for which official data is missing or is not updated regularly, which is especially important in the case of the most popular destinations. However, the main contribution of this solution is reflected in the possibility to monitor changes in the number and key demographic characteristics of emigrants with a relatively high frequency of updates.
Quarterly snapshots of the situation provide an opportunity for better identification of the drivers of change in migration dynamics, such as economic disturbances, pandemics, wars, climate factors, etc. Thus, this analysis clearly detected the impact of the COVID-19 pandemic on changes in the number of emigrants originating from Serbia - first through a decline that stopped before mid-2021 and then through a slight increase - indicating that pre-pandemic migration flows are likely to continue after the initial shock of abrupt border closures at the end of the first quarter of 2020.
Another dimension of this solution, based on the analysis of data from Microsoft Academic - a search engine for academic publications, provided an estimate of the outflow of scientists from Serbia over the last two decades, which is one of the most intriguing issues regarding our recent emigration.
Thus, they concluded that the productivity of Serbian scientists increases significantly if they are employed in renowned foreign institutions. On average they have 2.3 more published papers per year compared to researchers working at state universities in Serbia. A particularly significant contribution of this solution is the conclusion that scientists who stayed in Serbia and collaborated with renowned scientific institutions abroad, also had a high increase in scientific productivity. They publish an average of 1.9 papers per year more than colleagues who did not engage in this type of international collaboration. This is a very clear suggestion to the scientific community and policy makers in Serbia regarding the benefits that circular migrations of the most educated in the population can have for our country.
The departure of young people to study abroad, attend education programs and trainings does not necessarily lead to the famous 'brain drain', but to networking and cooperation that benefits the country of origin - through transfer of knowledge, technology and experience, while with adequate state support, many young scientists could return and pursue careers in their country.
An important source of information for understanding and predicting migration flows is the analysis of supply and demand for labour abroad, as manifestation of the so-called push-pull mechanism, which is basically economically-induced migration.
The InfostudData team, composed of nine members of the company Infostud from Subotica, analysed the ads for jobs abroad, which were published on the web portals Poslovi.infostud.com and NajStudent.com from 2013 until today.
The goal of this solution is to better understand the intentions of the demographically most vital population of Serbia to move abroad. This kind of analysis illuminates both sides of the ‘push-pull’ mechanism, the repulsive and attractive factors.
Although the intentions to move out do not necessarily lead to an actual departure, one of the most important contributions of this solution is that it indicates the degree to which the supply and demand of jobs coincide, i.e. what are the differences in terms of the type of employment and level of education between jobs offered by foreign employers and jobs wanted by the citizens of Serbia.
The results show the difference between jobs that can lead to long term or permanent emigration (predominantly to Germany) and those that are purely seasonal (countries in the region of the former Yugoslavia). Finally, this research suggests that there is a marked overlap between supply and demand in specific occupations that are in short supply both in Serbia and destination countries. This could clearly indicate to the decision-makers those segments of the labour force in Serbia where the wage gap in relation to rich countries seems to be the key motivation for leaving.
A close look at the population in Serbia
Depopulation in its basic meaning refers to reducing the total population, which is the central theme of the other two solutions. They focused on micro data, considering the spatial dimension of depopulation. The Geoanalysts team ‘overlapped’ data from different types of sources to get a more complete picture than that painted by demographic statistics. The PopInsight team took an innovative approach by analysing mobile phone traffic to predict the depopulation process.
The solution of the Geoanalysts team, which consisted of six researchers from the Geographical Institute "Jovan Cvijić" of the Serbian Academy of Sciences and Arts, made a step forward in understanding the spatial distribution of the population of Serbia depending on the size and geographical position of the settlement. Sets of geospatial data from different sources, such as night lighting, road network density, distance from traffic hubs, have been combined to indicate spatial, morphological and population changes over the past few decades.
By combining traditional and alternative data sources, from the aspect of studying demographic processes, the user is given an insight into micro data, i.e. population distribution at the settlement level, which makes spatial identification of depopulation much closer to the real situation on the ground than classical demographic sources can show. The main contribution of this solution is the interactive cartographic display of as many as 12 indicators of depopulation over a long period of time.
The user is able to see clearly what the reduction and aging of the population means for people living in cities - from the smallest to the largest, and what it means for those in rural areas, especially peripheral and mountainous. The web platform of this solution contains layers of high-resolution georeferenced data, which can be an excellent input for more complex analyses in various domains, including predicting population changes.
The PopInsight team, made up of six researchers from the Biosense Research Institute in Novi Sad, went a step further in trying to calculate the specific risks of depopulation at the municipal level by combining official demographic data with alternative sources. This solution is based on the analysis of anonymized and aggregated micro data on all types of telecommunication activities of mobile users of the telco operator Telekom Srbija during the first half of 2020, obtained thanks to the partnership that UNDP has established with this company for the needs of this challenge.
Advanced methods of statistical analysis of telecommunications traffic have made it possible to identify differences in the level of activity between municipalities and determine the degree of their interconnectedness. Calculated correlations between indicators derived from telecommunications data analysis and official demographic statistics, based on machine learning models, served to formulate the team's main finding - the weaker telecommunications a municipality has with other municipalities, the greater the risk of depopulation.
The web platform of this solution provides users with an extremely rich set of over 150 detailed mobile traffic indicators, which open the possibility for further analyses and demographic interpretations. The potential for recognizing patterns of daily migration and internal migration through the analysis of mobility data is particularly pronounced, which is a contribution of the solution that has just been hinted at, and could be of great importance for understanding these processes.
Opportunities for further improvement
The most common objections to alternative data sources concern their conceptual inconsistency with official demographic statistics, i.e. the credibility of indirect indicators that do not directly measure demographic phenomena. The main drawback is that such sources either do not provide information on important demographic structures (gender, age, partnership status, education, employment, etc.) that are important for planning and decision-making in vital government systems, or relate to only one aspect of population dynamics (migration). The key advantage of the presented alternative solutions is that they offer more up-to-date information than official demographic statistics.
Also, the results of all four solutions have a strong forecasting capacity, which is lacking in traditional sources, thanks to the capabilities of machine learning algorithms, related to processing of big data on which these alternative sources are based. For example, if the InfostudData team's analysis of labour supply and demand abroad were conducted quarterly and evaluated by the Bootstrappers team's results on changes in the volume of emigration from Serbia, an estimate could be made of the number of Serbian citizens intending to emigrate.
Also, the potential of geo-spatial linking of different types of data for identification and forecasting of demographically endangered zones is obvious, as shown by the teams Geoanalysts and PopInsight. If all other telecommunications operators in Serbia provided the PopInsight team with access to their data, the capacity of the results of this solution would drastically increase, especially in terms of analysis of roaming traffic, i.e. international migration.
In that sense, it is important that all four solutions and data obtained by the teams are open and available for further research or creation of new products, to anyone interested in using them in order to contribute to a better quality of life in Serbia.
Despite the current limitations, these four solutions suggest that alternative data sources can be a valid corrective and complement the official data related to population dynamics. The Statistical Office of the Republic of Serbia has already announced that it could use the Bootstrappers solution as an additional source in making its own external migration estimates. Given that digitization is becoming inevitable in all spheres of life, it seems that some of the alternative sources have a clear capacity to grow into regular demographic statistics in the foreseeable future.
Depopulation Data Challenge winners
Challenge winners:
PopInsight
References:
Peach, K., Berditchevskaia, A., Mulgan, G., Lucarelli, G., & Ebelshaeuser, M. (2021). Collective Intelligence for Sustainable Development: Getting Smarter Together. UNDP Accelerator Labs, Nesta. Retrieved from
https://smartertogether.earth/download-report
Stanković, V. (2014). Srbija u procesu spoljnih migracija. Beograd: Republički zavod za statistiku. Retrieved from
https://pod2.stat.gov.rs/ObjavljenePublikacije/Popis2011/Inostranstvo.pdf
UNDP Serbia (2021). Predstavljena saznanja o depopulaciji u Srbiji [Press release]. Retrieved from
https://www.rs.undp.org/content/serbia/sr/home/presscenter/articles/2021/predstavljena-nova-saznanja-o-depopulaciji-u-srbiji.html
United Nations (2015). Transforming Our World: The 2030 Agenda for Sustainable Development. New York: United Nations. Retrieved from https://sdgs.un.org/2030agenda
United Nations (2020). International Migrant Stock 2020. United Nations Department of Economic and Social Affairs, Population Division. Retrieved from https://www.un.org/development/desa/pd/content/international-migrant-stock
Note: A part of this blog was originally published in the scientific journal "Stanovništvo" Vol 59 No 2 (2021) in December 2021