Unlocking Knowledge with Open Data
January 23, 2023
Open data contributes to the increased quality, efficiency, transparency, and accountability of public services. It drives economic growth through supporting innovation and informing decision-making.
Research should be freely accessible to all according to Robert Merton, largely considered the founder of the sociology of science. One of the Mertonian norms of good scientific research is that each researcher must allow knowledge to move forward. In this view, the Nobel Prize laureate, Elinor Ostrom, identified open data as a new kind of “public good.” According to her, open data enriches the common stock rather than depletes it.
Governments across the world are starting to embrace this viewpoint by treating data as a public good. However, my experience with the accessibility of Kosovo data was challenging – until I discovered the Millennium Challenge Corporation’s open database.
In March 2020, as the world was entering a lockdown, I was searching for a topic for my master’s dissertation that was relevant to a developing country, was relatively unexplored, and had data available to support it. I was doing a Master’s in Development Economics at the University of Sussex, Brighton, UK, and I decided to study unemployment duration in Kosovo, my home country. Both the high unemployment rate and long-term unemployment are widespread phenomena in Kosovo, but the latter remains largely under-studied.
Obtaining a dataset, however old, for this research was difficult due to the policies of the Agency of Statistics in Kosovo (ASK), which does not permit use of datasets outside its offices in the capital city. I could not travel back due to the pandemic lockdown and therefore considered changing my dissertation topic to one with data available to support it.
Just as I was about to settle on using macroeconomic data or explore topics from other countries, I found the 2017 Labour Force and Time Use Survey in Kosovo from the MCC’s Kosovo Threshold Program. The dataset was publicly available online, easy to use, and had 32,000 observations with plenty of variables to be used to explore the reasons for unemployment duration in my country. The data package came with a report summarizing the main findings, a user manual, and the full survey in Albanian, English, and Serbian. All manipulations to variables were explained in detail, such as methods used to protect the respondents’ privacy. In addition, quality assurance measures, such as training and supervision received by enumerators, were explained.
Discrepancies between ASK and MCC statistics for important variables, such as the unemployment rate, were also accounted for. Taking a dataset from its raw form to a usable structure took most of my classmates a couple of months, about two-thirds of the time available to complete the whole dissertation. In my case, because the dataset had already been cleaned, this process took a couple of weeks. As such, I was able to spend the bulk of my time focused on analysis, rather than making the data usable.
My original hypothesis was that given the high inflow of remittances and the high rate of educated labor, people were voluntarily unemployed, thus holding out for higher paying or higher skilled jobs. My analysis showed that there were not enough jobs in the market, in line with the experience of other developing countries. The model I used was robust and the high number of observations made the results more reliable. I ended up graduating with distinction and a forthcoming opportunity to publish my thesis.
Having such high-quality data is important not only to researchers but to the private sector as well, given that reliable research produces reliable decisions. The dataset, however, was not easy to find. Many of my colleagues found out about it through my dissertation and were surprised they had not come across this dataset before. I believe such good quality datasets should have been advertised more. That is why I was excited to learn that MCC recently launched a new MCC Evidence Platform to better publicize and share its many knowledge products. The Platform should support researchers like me in accessing and using evaluation reports and datasets from over 35 countries.
I now work with data as part of my new job as the economist of the Compact Development Team (the interim implementing entity) in Kosovo. As Kosovo is finalizing its threshold program, it is also developing its first compact. The Kosovo Compact is designed to respond to the constraint of unreliable supply of electricity in Kosovo, align with the Government of Kosovo’s national development and energy priorities, and lead to poverty reduction and sustainable economic growth in Kosovo. The compact consists of three projects: 1) the Energy Storage Project, 2) the Just and Equitable Transition Acceleration Project, and 3) the American Catalyst Facility for Development Project. As the economist of the team, I am now in charge of ensuring that the data generated is reliable not only for the design of the compact, but also for other researchers to use, in the spirit of the Mertonian norms of good scientific research.