How to Deal With Big Data?
The SEC’S Chief Economist and Director, S.P. Kothari, made a speech this summer highlighting how standardisation can help tackle the challenges presented by big data.
While big data is certainly not a new concept, the digital era has accelerated data collection immensely, with some estimates indicating that the world now generates more data every two days than all of humanity generated from the dawn of time to the year 2003. The SEC alone receive upwards of two million filings annually – each of which is made up of multiple pages and thousands of pieces of individual information.
Kothari explained that big data is characterised by the “three v’s” – the volume of the data, the velocity it is created and stored at, the variety of the data type and format (while veracity – quality and accuracy – of the data is really the 4th “v”).
While not much can be done to alter the volume and velocity of data collection, the variety of data formatting can be tackled. Kothari highlighted how the introduction of standardised data tagging in machine-readable languages like Inline XBRL has made it simpler and more cost effective to analyse data. A constant challenge faced by the SEC (and other securities regulators around the world) is to find ways to further reduce the variety of data without losing information (or veracity).
Data tagging also allows regulators to make the most of big data sets, as data in tagged documents can be linked across regulatory and even national boundaries. This allows regulators to gain a better picture of the risks in financial markets that nowadays do not stop at national borders. Another way to speed up connections between disparate big data sets is through standardised identifiers like the Legal Entity Identifier (LEI), which offers a single, global way of connecting data sets through accurate identification of legal entities.
Structuring disclosures is also essential for facilitating research that uses big data: it makes analysis faster, and the data easier to access. Machine readable data can be immediately processed by software, allowing for fast and less costly aggregation, comparison and large-scale statistical analysis. Structured data formatting is essential to unlock big data sets for future research in corporate finance and macroeconomics.
We’ll be covering this subject at this year’s Data Amplified conference — don’t miss it!
Read the speech in full here.