How to use big data in government statistics
In today’s world, scientific and technological progress is changing with each passing day. Modern information technologies such as the Internet, cloud computing and big data have profoundly changed the way people think, produce, live and learn. The convergence of information technology and economic society has triggered explosive growth of data, which has become an important factor of production and a national basic strategic resource. In recent years, the National Bureau of Statistics has promulgated a series of important measures to promote the application of big data, determined the general idea of "overall design, taking the lead in tackling key problems, making it easy before it is difficult, and making a professional breakthrough" and the working goal of "building the second track of government statistical data sources in China", steadily promoted the application of big data in government statistics, and accelerated the deep integration of big data and government statistical work.
First, what is big data?
Big Data is considered as a new type of strategic resource, which can help realize comprehensive display, accurate prediction and intelligent decision-making of economic and social development. At present, the concept of big data has basically reached a consensus, although there are still some disputes on some details. People’s Republic of China (PRC)’s former General Administration of Quality Supervision, Inspection and Quarantine and China’s National Standardization Administration issued the national standard "Information Technology Big Data Terminology" (GB/T 35295-2017) on December 29th, 2017, and it was officially implemented on July 1st, 2018. The standard points out that big data is "data containing a large number of data sets with the characteristics of large volume, diverse sources, rapid generation and variability, which is difficult to be effectively processed by traditional data architecture". This definition of big data in the field of information technology can be used as an important reference in other fields. For government statistics, big data is usually regarded as data that adopts various data collection methods and integrates multiple data sources. It is data, methods and technical integration that are processed and mined at high speed by modern information technology and architecture, and have high application value and decision support function.
Second, what are the main characteristics of big data?
It is generally believed that the main features of big data can be summarized as four "V" (see right):
The main characteristics of big data
First, the data Volume is huge. At present, the data set scale of human society has gone from GB to TB to PB, and even to the extent of measuring by EB and ZB.
Second, the application Value is huge. After targeted collection, cleaning and analysis, big data has application value and supporting role for government decision-making, business operation and mass consumption. If big data and traditional data can be deeply integrated and organically combined, new information and knowledge may be generated. Using and processing big data is to quickly "purify" the value of data through powerful machine algorithms.
Third, there are Variety data types. Big data mainly includes structured data, semi-structured data and unstructured data, such as audio, video, pictures, web logs, geographical location information and other types of data. The proportion of unstructured data is high and increasing, which poses a great challenge to conventional data analysis tools.
Fourth, the generation speed is fast (Velocity). Big data is often generated in real time and quickly in the form of data streams. The extensive and in-depth application of mobile phones, Internet of Things, tablet computers, mobile Internet and various sensors has provided convenient conditions for improving the production speed of big data. The processing of big data needs to adopt non-traditional technical means, introduce new infrastructure, and strive to solve the problems related to fast computing and real-time storage.
3. What are the main types of government statistics applying big data?
According to the Guiding Opinions on the Statistical Application of Non-traditional Data (Guo Tong Zi [2017] No.160) jointly issued by the National Bureau of Statistics and the National Development and Reform Commission, big data is the main body of non-traditional data, and in many cases it can refer to non-traditional data. Specifically, big data refers to data obtained through non-traditional government statistical surveys (also called "second-hand data" by some foreign institutions), including administrative record data, business record data, Internet data, electronic equipment induction data and other big data of government departments. There are the following main differences between big data and traditional survey data (see the table below):
Differences between Big Data and Traditional Survey Data
Fourth, China’s government statistical big data application is at the forefront of the world.
As the "second track" (or emerging track) of government statistical data sources, big data has a wide range of applications, covering almost the whole statistical process such as data collection, processing, storage, analysis and release. In recent years, Chinese government statistics has actively carried out the application of big data and achieved remarkable results. In the fields of accounting, industry, energy, investment, trade and economy, population, society, science and technology, agriculture, price, households, service industry, etc., big data such as administrative records of departments and electronic data on the Internet are widely used as supplements to conventional statistical survey data to improve the scientific nature of statistical survey results. The application of big data method to data quality problem finding, data quality audit and evaluation has improved the quality of statistical data, provided new technologies and means to curb statistical fraud, and played a positive role in improving the accuracy and reliability of statistical data. Use big data to carry out professional statistical evaluation, supplement the shortcomings of conventional statistical investigation, improve statistical investigation methods, improve data production methods, expand statistical investigation index system, and improve the quality and efficiency of statistical investigation. Use big data to improve data processing, analysis and sharing mechanisms, further improve the ability to develop application data, and enhance the accuracy and timeliness of statistical analysis, monitoring and early warning. On the whole, at present, China’s government’s statistical application of big data has been at the forefront of the world, and it is basically on the same starting line as major developed countries and regions.
(Written by: Qi Haiqi He Qiang)