Data set “Big Data of the Modern Global Economy: Digital Platform for Data Mining – 2020”
In a digital economy, data becomes the centerpiece of economic activity and the most valuable resource. Achievements of scientific-technological progress today are so outstanding that they allow data mining based on artificial intelligence (AI) with the use of Big Data processing technologies. The Internet of Things (IoT), Ubiquitous Computing (UC) and other breakthrough technologies in Industry 4.0 allow keeping a complete record of economic activity, ensuring its transparency, controllability and forecastability.
However, a severe barrier on the way to optimization of the information management is its inconsistency and fragmentation. Through advanced technological capabilities, some international organizations create datasets to simplify their management of statistics that is presented to them. The dataset of the International Monetary Fund can be cited as the most successful example (IMF), as it breaks up the countries into economic and geographic categories, which allows them selecting any indicators from the proposed list and contains the forecast data for the period until 2022.
World Economic Forum started issuing the annual dataset (Report Reader) in 2018 (WEF). This dataset has limited capacities – it allows sorting data either by countries or by indicators; the list of indicators is available in a pop-up window and is not immediately visible, which makes the management of the dataset more complex. The World Bank (WB) offers data for a wide range of fields, but the only way to optimize the data management it offers is to import it into Microsoft Excel. Apart from that, the data of the World Bank are unstructured; even combining samples in Microsoft Excel is difficult due to the fact that the number of countries for which the data are presented is different in each sample.
In the current Russian statistics, for example, Federal Service of State Statistics (Rosstat) (GKS) and the Scientific Research Institute “Higher School of Economics” (SRI HSE: https://www.hse.ru/primarydata/ice2019kr) most of the data is not presented in digitalized form (they are presented in the form of pictures that cannot be edited); moreover, changes are made annually in the composition and the names of statistically identifiable indicators. This fact significantly complicates the analysis of the time series (studying the dynamics of economic phenomena and processes) or even makes it completely unfeasible.
Thus, datasets are highly relevant for the contemporary science, since they form the backbone of highly-efficient empirical research, and the lack of datasets complicate and hampers applied researches, resulting in the prevailing development of theoretical science and the inability to identify new problems and significant consistent patterns for their solution. International organizations recognize the growing need of contemporary science for datasets, but some of their small-scale initiatives fail to solve this problem.
The Institute for Scientific Communication (ISC) contributes to solving the problem raised and creates the first dataset in Russian, combining the leading world statistics. The ISC dataset generates Big Data of the modern global economy and constitutes a digital platform for data mining.
The dataset contains indicators for the most relevant lines of economic research:
Advantages of the ISC dataset:
Reliability: the dataset combines statistics of relevant international organizations, in particular, IMD, WEF, WIPO, WB, Numbeo, New Economic Foundation и Sustainable Development Solutions Network);
Relevance and representativeness: the dataset contains the up-to-date figures (as at year-end 2019), which form the basis of empirical research in 2020;
Worldwide coverage : the dataset contains statistics for the complete sample of modern countries of the world, thanks to which it offers ample opportunities for the worldwide analytics;
Consistency: collection and systematization of basic statistical data in the common dataset;
Distinct structure: in order to make the management of datasets the most simple, fast and user-friendly, topics were distinguished in its structure;
Informational content: the dataset presents relevant international statistics in Russian;
Availability of templates: the dataset offers two data templates: G7 countries (developed countries) and BRICS countries (developing countries), CIS countries, EEU countries, which enables accelerated selection of the necessary data for economic experiments aimed at comparing countries of the main categories in real-time mode;
Data import: the dataset allows selecting the necessary information and importing it into Microsoft Excel for further analysis;
Interactivity: the dataset allows sorting and combining various data, integrating them into a common array in exactly the way that is required by every user;
Work on the blockchain principle: the dataset allows sharing information, changing and processing it on demand of users, and the initial data remain unchanged at that, which is very convenient and safe.
The data set was developed by Elena G. Popkova, D.Sc. Economics, Professor, President of the Institute of Scientific Communications.