Let’s focus on broadening scope of data collection to make statistical system more comprehensive

Pulak Ghosh, Soumya Kanti Ghosh, Sumit Agarwal
coronavirus global markets, coronavirus effect on indian economy, Yes Bank, Yes Bank case, Yes Bank debt, Yes Bank NPA, NSO data on GDP, GDP data revisions, indian express

It is common for such unconditional bias to arise due to the fact that the statistical reporting agency produces releases according to an asymmetric loss function. (Illustraton: C R Sasikumar)

Last week was not kind to global markets as fears of the coronavirus turning into a world-wide pandemic affected markets adversely, India included (it was also impacted by news of Yes Bank). During all this, the latest GDP data witnessed significant revisions that have gone largely unnoticed. In the last few years there has been a lot of noise regarding the data revisions. While part of it requires closer examination, we must be fair to our statistical system as such revisions are, in large part, due diligence and happen globally.

Let us first look at the history of GDP data revisions. The first table shows the extent of GDP data revisions since FY15, when the new series was introduced. The first column in the table explains the simultaneous revisions that have taken place over the years. The NSO releases the first estimates of any fiscal year in January, revises it in February and then again in May.

Simultaneously, it revises the previous year estimates in February, alongside the February data release. The primary criticism, apparently, with the current year’s fiscal data is that the revisions in February for 2019-20 and the 4th revision in 2018-19 are almost identical, implying that the sanctity of 5 per cent growth was statistically protected.

Opinion | P C Mohanan writes: Government data always come with limitations, but now they have a political dimension

Source: SBI Research

Let us examine, based purely on data, the criticism of such revisions. First, there is precedence to the first and second quarter revisions for the current financial year that happen in February. For example, while in the current fiscal, the cumulative downward revision was close to Rs 30,000 crore, in FY19, there was even a greater upward revision of roughly Rs 86,000 crore in February.

Second, is there precedence of such large first-time revisions? Yes, there has been since 2014-15. In 2018-19, the first-time data was revised by a sharp Rs 1.43 lakh crore, while in 2017-18, it was revised by an even larger Rs 1.69 lakh crore.

Third, the simultaneous revisions are mostly in the same direction, though different in magnitude, and hence it is unfair to say that the 2018-19 data was revised downwards to protect the 2019-20 numbers.

The problem has been that the global and domestic uncertainties in 2017-18 and 2018-19 have been so swift that it has been virtually impossible to predict the outcome initially. While in 2017-18, the final estimates were progressively higher, in 2018-19, while the interim estimates were higher, they were drastically scaled down later as the impact of the NBFC crisis began to unfold.

Opinion | Sutjit S Bhalla writes: It is time we recognised that survey data cannot be interpreted in the way it used to be

Source: SBI Research

We would like to point out here the example of US Fed that had also missed the possibility of the US economy bouncing back in 2018 on the back of tax cuts when in 2015 it had projected the economy to expand by only 2 per cent, only to change it to 3 per cent in 2018 (almost at par with scale of revisions in India).

It is common for such unconditional bias to arise due to the fact that the statistical reporting agency produces releases according to an asymmetric loss function. For example, there may be a preference for an optimistic/pessimistic release in the first stage, followed by a more pessimistic/optimistic one in the later stage. Intuitively, one might argue that the cost of a downward readjustment of the preliminary data is higher than the cost of an upward adjustment. This asymmetric loss function is not so relevant at the reporting stage, but at the forecasting stage. A statistical reporting agency like the NSO simply does not have all the data at hand and has to forecast the values of the yet to be collected data. It is at that moment that the asymmetric loss function comes into play. So, we must be careful about interpreting data revisions by the NSO by attributing ulterior motives as we more often tend to do.

However, we must also add that unlike countries across the world, India is still significantly lagging in its use of data analysis. Some of the current methodology of data collection is based mostly on thin surveys and is not supported by data available in the public domain that are more comprehensive, less biased and real-time in nature, based on digital footprints. The end result is that we end up publishing survey results that are misleading.

Thus, we must develop an ecosystem that is high quality, timely and accessible. Big data and artificial intelligence are key elements in such a process. Big data helps acquire real-time information at a granular level and makes data more accessible, scalable and fine-tuned.

For example, a US inflation report released in April 2019 offered an interesting take on how the use of big data was revolutionising data collection. Instead of sending people out to stores to check prices, as it has done for decades (and also practised in India), the US Bureau of Labour Statistics gathered data for the price of apparels directly from a big department store. With the switch, the largest monthly drop in apparel prices on record was witnessed. In similar vein, for India, the inclusion of items available for online sale could even further compress the headline consumer price index.

Opinion | P C Mohanan writes: Bias exists in survey responses, but also in government’s own telling

The use of payments data can also help track economic activity, as is being done in Italy. Different aggregates of the payment system in Italy, jointly with other indicators, are usually adopted in GDP forecasting, and can provide additional information content. Using a similar corollary for India, proper use of GST data will reveal the sectors that are giving maximum revenue, that are showing month-on-month increase, and can help make predictions of net revenue growth, while also helping in fraud detection. Further, as India is a consumption-oriented economy, we must explore measuring GDP using the GST data.

In India, currently survey results are giving contrasting results. For example, the weighting pattern of food items in CPI at 45.86 per cent is based on the 2011-12 consumer expenditure survey (CES). This is significantly different from the share of food and beverages (27.6 per cent) in the private final consumption expenditure (PFCE) published by the national account statistics (NAS). If we approximate the CPI with the NAS food weights, the headline CPI drops to 7.6 per cent from 5.5 per cent in the latest inflation print.

Recent independent research also shows significant divergence between the consumer price index for industrial workers and the consumer price index (urban) in recent times, when in terms of the composition of the basket and the target population, the two are quite similar.

But to be fair to both the RBI and the NSO, the volatility of oil prices and structural changes in the economy make the forecasting of inflation and GDP a difficult job indeed. however, we should supplement our existing measurement practices with “big data” to make our statistical system more comprehensive and robust.

This article first appeared in the print edition on March 9, 2020 under the title ‘Don’t blame it on NSO’. 

Pulak Ghosh is professor IIM Bangalore, Soumya Kanti Ghosh is group chief economic advisor, State Bank of India, and Agarwal is professor, National University of Singapore. Views are personal

Opinion | Mind the statistics gap