The SEC Division of Economic Risk and Analysis (DERA) is the linchpin for the SEC’s ever-growing use of structured data and data analytics. DERA was created in 2009, when the SEC was just beginning to require filers to use XBRL. Its stated mission is to “integrate financial economics and rigorous data analytics into the core mission of the SEC.” Specifically, it devises the economic and statistical analyses used throughout the agency. To do this, DERA employs a well-rounded, multidisciplinary team of attorneys, economists, analysts, statisticians, and computer programmers. The fruits of their labors include analytical tools used by the SEC’s Division of Corporation Finance and the Division of Enforcement.
DERA’s current chief economist and director is S.P. Kothari, who came to the SEC from MIT’s Sloan School of
Management in Cambridge, Massachusetts. He revisited Cambridge in 2019 to give a speech at a conference, Big Data And High-Performance Computing For Financial Economics, held by the National Bureau of Economic Research. His talk offers valuable insights into the SEC’s current use of big data, its successes and challenges in this area, and the future of big-data science at DERA. He emphasizes that structuring data to be machine-readable in XBRL will make it much easier for investors to analyze information quickly when making investment decisions and much harder for companies to hide accounting fraud from the SEC.
What “big data” means at the SEC
Dr. Kothari begins by outlining the universe of big data in the SEC’s domain. For example, two million separate filings are made in EDGAR every year, each of which is itself a large, elaborate disclosure containing multitudes of data. Another example is the SEC’s Option Pricing Reporting Authority (OPRA), which gathers two terabytes of information every day. To put that in perspective, one terabyte is 1,000 gigabytes. Today’s typical laptop or desktop computer has a maximum storage capacity between 500 and 1,000 gigabytes, so OPRA’s daily data diet would completely fill a hard drive by lunchtime.
Dr. Kothari lists three commonly held characteristics of big data:
- Volume (the quantity of data)
- Velocity (the speed at which data is created and stored)
- Variety (differences in data types and formats)
“To this list of three,” he notes, “some would add a fourth v, veracity. Veracity is the quality and accuracy of the data.”
Big-data policy challenges: cybersecurity, technology, and communications
The SEC faces several policy challenges—in the realms of cybersecurity, technology, and communications—arising from the use of big data. Big data is hard to store and safeguard; furthermore, the bigger it is, the more enticing it becomes as a target for criminal hackers. Dr. Kothari explains that “portfolio holdings data for all investment advisors are more valuable than portfolio holdings data for one investment advisor, and weekly portfolio holdings data are more valuable than annual portfolio holdings data. These challenges get harder as certain datasets start to include more personally identifiable information (PII) or identifiers that link investors and institutions within and across datasets.”
Echoing a keynote address made only days prior by SEC Chair Jay Clayton, Dr. Kothari reiterates the SEC’s commitment to responsible collection and use of sensitive data from filers. “Naturally, data collection is not an end unto itself—the SEC must not be in the business of ill-defined and indefinite data warehousing,” the very same turn of phrase used by Chair Clayton. For example, Form N-PORT is a new form for disclosing both public and nonpublic fund portfolio holdings. The SEC recently changed its submission deadlines to shrink the volume of sensitive data the SEC is holding. “This simple change reduced the SEC’s cyber-risk profile without affecting the timing or quantity of information that is made available to the public.”
In the SEC’s never-ending technological arms race with the market, the use of artificial intelligence, machine learning, and related tools is growing among the major Wall Street players and other securities-trading firms. Some technologies, such as artificially intelligent algorithmic trading, are “inherently challenging” for the SEC to monitor. To match wits with the market, the agency has prioritized developing and supporting a workforce with big data skills and experience. In its single decade of operation, he indicates, DERA has expanded from 30 employees to almost 150 staffers now.
XBRL helps in turning big data into useful information
Big data must be turned into useful information for all types of market participants, from large pension funds to individual retail investors. An ongoing challenge for the SEC is discovering cost-effective methods to hone the variety of financial data into a readily consumable form without losing substantive information. Dr. Kothari cites the success story of financial discosures tagged in XBRL. “By dramatically reducing the variety of the data, tagging transitions an electronic document from being human-readable into one that is also machine-readable.” This makes it easier for investors to assess the information and harder for filers to conceal fraud. Tracing the roots of structured data at the SEC back to its XML requirement in 2003, he notes that the SEC’s new requirement for Inline XBRL will further the benefits of structured data. (See CD&Is on interactive data for guidance on the SEC’s Inline XBRL requirements.)
“Structured information can also assist in automating regulatory filings and business information processing,” Dr. Kothari adds. Tagging the numeric and narrative-based disclosure elements of financial statements and risk/return summaries in XBRL standardizes those disclosure items; they can then be processed immediately by software for analysis. “This standardization allows for aggregation, comparison, and large-scale statistical analyses that are less costly and more timely for data users than if the information were reported in an unstructured format.”
Tagging also has “network effects.” Data in Forms 10-K can be linked to data in other forms and from other filers, as well as across regulatory and national borders. “[A] key benefit of cross-regulator consistency in tagged data is the ability to understand better the nature of the risks in the financial markets,” Dr. Kothari observes. “The markets today do not stop at national borders, so looking only at intra-national data provides only a partial picture of the system’s risk.”
Bright future for big-data science at the SEC
Dr. Kothari sees many future opportunities for DERA research based on forthcoming massive datasets, from the longawaited Consolidated Audit Trail (CAT) to the Legal Entity Identifier (LEI). Through the SEC’s enhancement of its data science, “big data will continue to help the SEC and other market regulators identify and shut down bad actors.”
To read the most recent issue of DIMENSIONS, click here.