xBRL-CSV requirements 1.0

Requirements Document 4 August 2021

This version
https://www.xbrl.org/REQ/xbrl-csv-requirements/REQ-2021-08-04/xbrl-csv-requirements-2021-08-04.html
Editors
Paul Warren, XBRL International Inc. <pdw@xbrl.org>
Mark Goodhand, CoreFiling <mrg@corefiling.com>
Contributors
Herm Fischer, Mark V Systems <herm@markv.com>
Paul Hulst, De Nederlandsche Bank N.V. <P.J.Hulst@dnb.nl>
Eleanor Joslin, CoreFiling <ejj@corefiling.com>
Daniel Dracott, CoreFiling <djd@corefiling.com>

Table of Contents

1 Overview

The XBRL Open Information Model enables the XBRL report data to be represented in different formats. XBRL v2.1 defines an XML-based syntax for XBRL data (now referred to as "xBRL-XML"), and the OIM Working Group has also defined a JSON-based syntax ("xBRL-JSON"), intended to make XBRL data simple to consume.

This document defines requirements for a CSV-based syntax ("xBRL-CSV").

2 Use cases

As a data format, CSV has the advantages of being simple, extremely widely supported, and potentially very compact, particularly on datasets containing a large number of repeating records.

The growing trend for "granular reporting" (reporting source data in bulk, rather than summarised reports) has resulted in report sizes that push the limits of what is possible using XML and JSON-based formats.

The requirements in this document are intended to address two use cases that benefit particularly from these features.

2.1 Creation and submission of reports containing large amounts of XBRL data

For example, a bank may be required to report information about individual loans. This would require multiple data points (e.g. counterparty, amount outstanding, start date, maturity date) to be reported for each of a potentially very large number of loans.

2.2 Bulk publication of XBRL data

Among text-based formats, CSV is the most popular choice for the distribution of large data sets. A wide range of tools are available for the selection, transformation and visualisation of tabular data in CSV format.

Recipients of XBRL data (such as regulators) may accumulate very large XBRL data sets (either by accepting large reports or a large number of smaller reports). These data sets are best exposed to interested parties in an efficient, easily digestible form.

3 Goals

3.1 Efficiency

The xBRL-CSV format should provide a very efficient representation of reports that consist of a large number of facts.

3.2 Applicability

The xBRL-CSV format should not require significant structural transformation of existing tabular datasets. In other words, it should not require users to split or combine records in order to fit into the xBRL-CSV format.

It should be noted that it is not a requirement to accommodate existing CSV formats without any transformation (see Section 5.2), but the need for such transformations should be limited, and ideally constrained to transforming values to fit datatypes, rather than structural transformation of the data.

3.3 Readability

The CSV data and supporting metadata files should be easy for a human to understand.

3.4 Editability

The supporting metadata files should be easy for a human to edit. The application of xBRL-CSV to very large datasets means that human editability of the CSV data itself is not an important goal.

3.5 Simplicity

The xBRL-CSV format should minimise the burden on document authors and software vendors by avoiding unnecessary complexity.

3.6 Fidelity

The xBRL-CSV format should be capable of faithfully representing the semantic content of an OIM model.

3.7 Familiarity

The xBRL-CSV format should re-use structures and ideas from xBRL-JSON where appropriate.

4 Requirements

4.1 Representing report data in a single table

The simplest representation of an XBRL report in CSV format is as a single document with a column for each dimension.

entity concept period unit value decimals
ent:01 my:Assets 2019-01-01T00:00:00 iso4217:USD 42000000 -6

This representation is convenient for some forms of analysis. For example, it's easy to load it into Excel and use the auto-filter feature to browse the data.

However, a single table can get very sparse if facts use a range of different dimensional qualifications (for example, unit applies only to numeric facts, and language applies only to selected non-numerics), and there is a lot of repeated information in the table (e.g. concept names, entities) so in addition to this simple form, we will want to support alternative arrangements.

4.2 Partitioning report data across multiple tables

Although it is possible to present all facts in a single large table, there are a number of reasons why it may be preferable to split the report into several CSV documents:

xBRL-CSV should support reports that are made up of multiple CSV files with differing structure.

4.3 Sharing information common to all facts in a report

It should be possible to define information that is common to all facts in a report in a single place.

4.3.1 Sharing dimensional information at report level

Some dimension values may be common to all facts in a report. For example, many financial reports apply to a single entity.

To avoid bloating the report with repeated values, and improve readability and editability, it is desirable to define common values in a single place.

4.3.2 Sharing decimals information at report level

Numeric facts in reports are often represented with a common precision (e.g. to the nearest thousand). Where this is true for all or most facts in a report, it is desirable to express the decimals value at report level.

4.4 Sharing information common to all facts in a table

For reports spread across multiple tables, it should be possible to define information that is common to all facts in a table in a single place.

4.4.1 Sharing dimensional information at table level

A report may include separate tables for information about specific regions and time periods. Each of these may use a different currency. It is desirable to be able to define this information once for each table.

2019-europe-eur.csv

[unit=iso4217:EUR, region="r:Europe", period="2019-01-01T00:00:00/2020-01-01T00:00:00"]:

metric value
my:Revenue 1000000
my:Costs 800000
my:Profit 200000

2019-america-usd.csv

[unit=iso4217:USD, region="r:America", period="2019-01-01T00:00:00/2020-01-01T00:00:00"]:

metric value
my:Revenue 2000000
my:Costs 1500000
my:Profit 500000

4.4.2 Sharing decimals information at table level

Where the decimals information is not common to all facts in a report, facts may be arranged into tables such that each fact in a table has the same decimals value. Where such commonality exists, it is useful to be able to capture it at table level.

2019-usd.csv

[decimals=-6, unit=iso4217:USD, period="2019-01-01T00:00:00/2020-01-01T00:00:00"]:

metric value
my:Revenue 2000000
my:Costs 1500000
my:Profit 500000

2019-ratios.csv

[decimals=2, period="2019-01-01T00:00:00/2020-01-01T00:00:00"]:

metric value
my:Margin 0.25

As in the example above, changes in decimals often go hand-in-hand with changes in unit.

4.5 Sharing information across facts in a row

A number of table arrangements allow information to be shared across facts in a row.

4.5.1 Record-style data with a single dimension as key

xBRL-CSV should allow the representation of record-style data, with a key column corresponding to a dimension, and several fact columns for different concepts reported against that dimension.

transaction_id amount from to invoice notes
00000000001 10000 ent:001 ent:007 2019.098 5000 FooWidgets
00000000002 20000 ent:001 ent:007 2019.099 5000 BarWidgets
00000000003 120000 ent:001 ent:008 Payroll

4.5.2 Record-style data with a compound key corresponding to multiple dimensions

Often a record will have a "compound key" formed from multiple fields, which correspond to multiple dimensions in an OIM report.

The key may be formed entirely from taxonomy-defined dimensions, entirely from built-in dimensions, or from a combination of the two.

For example, a report might capture snapshot information about companies over a number of years, using the built-in entity and period dimensions and concept columns for each metric recorded :

entity period number_of_employees number_of_customers
ent:FooCorp 2018-01-01T00:00:00 3450 12350000
ent:BarCorp 2018-01-01T00:00:00 1230 34560000
ent:FooCorp 2019-01-01T00:00:00 3480 12010000
ent:BarCorp 2019-01-01T00:00:00 1220 34890000

Another report might store the names and descriptions of various investment vehicles, combining the built-in language dimension with a taxonomy-defined fund_id:

fund_id language name description
001 en Renewable energy fund Fund composed of companies in the renewable energy sector
001 fr Fonds d'énergie renouvelable Fonds composé d'entreprises du secteur des énergies renouvelables

A third report might contain monetary figures for a number of taxonomy-defined countries, reported in both local currency and US dollars.

country currency period gdp gni
r:UK iso4217:USD 2017-01-01T00:00:00/2018-01-01T00:00:00 2637870000000 2589020000000
r:UK iso4217:GBP 2017-01-01T00:00:00/2018-01-01T00:00:00 2022257088750 1984742732000

4.5.3 Columns for non-concept dimensions

In some table arrangements, the concept varies from row-to-row, but the majority of the columns correspond to other dimensions. For example, a regional breakdown for a number of different metrics may be expressed like this:

metric europe america asia
my:Revenue 1000000 2000000 3000000
my:Costs 800000 1500000 2000000
my:Profit 200000 500000 1000000

In other cases, the columns may correspond to periods:

metric Q1 Q2 Q3 Q4
my:TotalUsers 1000 2000 4000 8000
my:ActiveUsers 13 108 1435 6523

In addition to the record-style forms noted above, with the concept varying from column to column, xBRL-CSV should allow the concept to be specified by one column, with an arbitrary non-concept dimension varying across the others.

4.5.4 Dimensional data applied to a subset of the facts in a row

Sometimes we have dimensional information that varies from line to line, but which only applies to a subset of the columns.

For example, only the 'value' field in the following opportunity records is qualified by currency.

opportunity_id customer product value currency probability
a042fbb5032b0e Foo Corporation UltraWidget 345000 iso4217:USD 0.5
3791e6ddd51b01 Bar International MegaWidget 123000 iso4217:GBP 0.9

4.5.5 Support for instants and durations in a single row

A row of data may include a mixture of instant and duration concepts on separate columns, with facts for the instant columns corresponding to either the start or end of the period used for the duration facts.

To avoid duplication of period information, it should be possible to specify the value used for the period dimension for instant facts as being either the start or end point of a duration period.

For example, a report may track loans of varying durations, taken out at different times. In this case, the capital_repayment corresponds to a duration concept for the capital repayed over the period, while loan_amount and amount_outstanding correspond to an instant concept for the loan balance, at the start and end of the period respectively.

loan_id fixed_rate_period loan_amount capital_repayment amount_outstanding
C18984 2019-01-05/2020-06-05 10000 10000 0
M89128 2017-02-11/2022-02-11 342000 27789 314211
M89197 2018-11-07/2020-11-07 578000 0 578000

4.5.6 Sharing decimals information common to facts in a row

Tables may contain facts with a range of different types. For example, some facts may be monetary, while others are ratios. The monetary facts may be reported to the nearest thousand, while the ratios are precise to two decimal places.

If the concept is specified on each row, it is important that we also allow the decimals information to be specified along side it, e.g.

metric units decimals europe america asia
my:Revenue iso4217:USD -3 1000000 2000000 3000000
my:Costs iso4217:USD -3 800000 1500000 2000000
my:Profit iso4217:USD -3 200000 500000 1000000
my:Margin 2 0.20 0.25 0.33

4.5.7 Specifying facts using data point identifiers

Some reporting environments use a "data point"-centric approach to modelling, where the specific combinations of dimensions to be reported are identified by "data point identifiers". In such environments, it may be desirable for facts to be reported against data point identifiers, rather than against the individual dimensions that they represent. This is effectively giving a shorthand identifier for the "compound keys" described in Section 4.5.2.

datapoint value
dp1234 213000
dp1532 442555
dp1245 0.43
dp4252 37

The data point may not encapsulate all dimensions that form the compound key; data points may need to be repeated for multiple values on some further dimension:

datapoint counterparty value
dp1234 ABCD1234 213000
dp1245 ABCD1234 0.43
dp1234 HIJK4321 111000
dp1245 HIJK4321 0.67

4.6 Sharing information down columns

4.6.1 Binding a single dimension common to all facts in a column

Examples above have already illustrated the binding of a single dimension to a column.

Many tabular layouts use a column per metric. In OIM terms, this means binding the concept dimension on a column-by-column basis, as in this example:

region revenue costs profit
r:Europe 1000000 800000 200000
r:America 2000000 1500000 500000
r:Asia 3000000 2000000 1000000

But it should be possible to specify dimensions other than "concept" on fact columns.

Here is the same information, but with the concept dimension varying row-to-row, and the region bound to each fact-producing column:

metric europe america asia
my:Revenue 1000000 2000000 3000000
my:Costs 800000 1500000 2000000
my:Profit 200000 500000 1000000

4.6.2 Binding multiple dimensions common to all facts in a column

We have previously noted that units are strongly associated with concepts. For tables with concepts bound to columns, where the unit is not common across all columns, it is convenient to be able to specify the unit dimension along with the concept dimension on the columns, e.g.

region revenue ($) costs ($) profit ($) margin number_of_users
r:Europe 1000000 800000 200000 0.20 1194839
r:America 2000000 1500000 500000 0.25 2380108
r:Asia 3000000 2000000 1000000 0.33 9928190

Without this feature, we would need to partition the facts into separate tables.

4.6.3 Facts with differing dimensional qualifications across columns

As well as varying the values for dimensions across columns, it is desirable to support tables where the set of applicable dimensions varies from column to column.

For example, the table below shows a table in which the columns are defined by a mixture of concepts (e.g. "Turnover"), and dimensionally-qualified concepts (e.g. "Assets (Europe)" ). The first column represents a typed dimension which is used to qualify all facts in a row.

Figure 1: Dimensional qualifications varying across columns

Without the ability to do this, this would have to be broken into separate tables, as shown below.

Figure 2: Alternative arrangement to avoid varying dimensions across columns

Though it is sometimes desirable to split tables out like this, it increases the size of the CSV file (see Section 3.1) and we do not want to force this kind of structural rearrangement on users of xBRL-CSV (see Section 3.2).

4.6.4 Sharing decimals information common to all facts in a column

As well as binding the unit at column level, it is useful to be able to specify the decimals.

region revenue ($, -6dp) costs ($, -5dp) profit ($, -5dp) margin (2dp) number_of_users (0dp)
r:Europe 1000000 800000 200000 0.20 1194839
r:America 2000000 1500000 500000 0.25 2380108
r:Asia 3000000 2000000 1000000 0.33 9928190

In keeping with the goal of fidelity, it must be possible to faithfully preserve links in an OIM report when exporting to CSV, so that the information is available to interactive tools, and so that the data can be loaded back into OIM model without data loss. No special handling is required for text footnotes, as these are represented in OIM as facts. While the information must be preserved, it is not necessary for it to be exposed in CSV form (Section 5.1).

4.8 Stable, predictable ids for facts

xBRL-CSV should support the creation of predictable fact IDs, so that facts can be traced back to source data, and to permit the association of external information (such as error messages or comments) with facts.

Fact IDs should not be position-based. In other words, re-ordering rows or columns should not affect the generated IDs.

It is not necessary to support the specification of individual fact IDs (see Section 5.6).

4.9 Consistent treatment of Extensible Enumerations

The syntax used for Extensible Enumerations in xBRL-XML differs between EE 1.0 and EE 2.0, but they are represented consistently in xBRL-JSON (using QNames).

xBRL-CSV should represent EE values in the same way as xBRL-JSON.

4.10 Meta-data re-use

Where xBRL-CSV is used for submission of reports in a filing system (see Section 2.1), it may be desirable for the collector of the reports to specify the report structure to be used, rather than requiring filers to define it individually.

4.10.1 Minimisation of filer-provided metadata

The mechanism for metadata re-use should minimise the amount of metadata that a filer is required to provided. Filer-provided metadata should be limited to information that is specific to the report (for example, the filer's entity identifier, or the reporting period).

4.10.2 Re-use of table structure

It should be possible for the metadata defining the structure of CSV tables to be imported by reference (e.g. using a URL), rather than included as part of the report.

It should be possible for the creator of the table structure to indicate whether individual tables are optional or required in reports that reference those table definitions.

4.10.3 Restriction of metadata extension

Where metadata is re-used (see Section 4.10.2), it should be possible for the re-used metadata to restrict the extent to which an importing report can add to or modify metadata. This should include:

5 Non-requirements

5.1 Pure CSV solution

The process of mapping CSV documents to an OIM Report relies on a range of metadata:

It is theoretically possible to represent all of this information in CSV format, but much of it is more naturally and conveniently represented in JSON. Accordingly, there is no requirement for the supporting metadata to be stored in CSV form.

5.2 Support for existing CSV formats

xBRL-CSV is not required to support the mapping of "arbitrary" CSV file arrangements into XBRL. Although many tools have CSV export functionality already, the implementation of a new reporting requirement using xBRL-CSV or any other format will necessarily involve some degree of integration work, and it is not unreasonable to require some level of transformation of data as part of this.

For example, it is not necessary to support multiple date formats. Implementers can be expected to transform dates into a standard format for use in xBRL-CSV.

5.3 Value transformation

5.3.1 Transformation of values according to scale and sign

Inline XBRL provides features for adjusting the scale and sign of values as part of the mapping process that produces an XBRL report.

Many existing CSV documents would also benefit from these transformations. For example, financial data is often reported in thousands or millions.

However, the use cases for xBRL-CSV do not include the mapping of existing CSV documents (Section 5.2); the focus is on allowing efficient representation of OIM models in CSV form. Accordingly, scale and sign transforms are out of scope for xBRL-CSV 1.0.

5.3.2 Transformation of values using functions

Inline XBRL uses a registry of transformation functions to translate strings in a range of formats to XML Schema data types. For example, to convert 26-Apr-2019 to 2019-04-26.

Similar functionality would be useful for coping with arbitrary CSV files, but this is out of scope for xBRL-CSV 1.0 (see Section 5.2).

5.3.3 Transformation of values using maps defined in metadata

Representation of data in OIM may involve the mapping of external code lists to concepts and domain members in a taxonomy. One way to achieve this would be to place the mapping information in a JSON structure within the metadata file.

For xBRL-CSV 1.0, the assumption is that such transformations will be achieved by pre-processing steps.

5.3.4 Creation of values from a collection of columns

Existing table structures may have several columns that map to a single dimension or fact value in the XBRL data model.

For example, period information may need to be assembled from separate year, month and day columns.

While it may be useful to provide a feature in xBRL-CSV to handle this, it is out of scope for xBRL-CSV 1.0.

5.3.5 Creation of several values from a single column

Existing table structures may have a single columns that maps to multiple dimension or fact values in the XBRL data model.

For example, bank details may be stored as single account + sort code field, which needs to be split into separate account and sort code facts.

While it may be useful to provide a feature in xBRL-CSV to handle this, it is out of scope for xBRL-CSV 1.0.

5.3.6 Creation of values from a combination of cell values and values in the metadata file

In some cases the mapping from existing table structures to XBRL will involve the combination of source data with some fixed information. For example,

By storing the fixed information in the xBRL-CSV metadata, and combining it with source data dynamically as we produce facts, we can improve compactness, readability and editability.

While it may be useful to provide features in xBRL-CSV to handle these use cases, all are out of scope for xBRL-CSV 1.0.

5.4 Non-CSV tabular structures (HTML, Excel)

It is not difficult to imagine extending a specification for mapping CSV tables to cover the same structures embedded in HTML documents and spreadsheets.

Such work is out of scope for xBRL-CSV 1.0.

5.5 Transformation in the absence of DTS information

It is not required to be able to consume xBRL-CSV without access to DTS information. xBRL-CSV may rely on the availability of datatype, dimensional or other taxonomy information in order to correctly consume xBRL-CSV data.

5.6 Individual fact IDs

It is not necessary to support the specification of IDs for individual facts. This has the consequence that it will not be possible to include fact IDs from OIM data from other sources, although the requirement for predictable fact IDs in xBRL-CSV (Section 4.8) means that traceability could be maintained via external mapping tables.

5.7 Referential integrity checks

Reports may contain values that are expected to correspond to identifiers for rows in another table. For example, a table containing loan data may contain more than one loan for a single counterparty, and it may make sense to normalise counterparty information (e.g. name and address) into a separate table that is referenced from records in the loan table.

It is desirable that such relationships can be validated efficiently on the large datasets that xBRL-CSV is intended for use on, but specific functionality to address this is out-of-scope for the initial release of xBRL-CSV.

5.8 Positional markup (CSV files without headings)

Some markup formats for tabular data refer to columns by numerical index.

Such references are brittle, and hard for humans to follow. In the interests of clarity and safety, we require that all xBRL-CSV documents have column headers.

5.9 Human-friendly (non-id-safe) column headings

It is acceptable for xBRL-CSV to constrain the format of column headings in the CSV, in order to meet the syntactic constraints of identifiers.

5.10 Use of existing tabular metadata standards

Significant work on semantic markup for CSV documents has already been undertaken by the W3C's CSV on the Web (CSVW) working group, culminating in the 2015 publication of the Model for Tabular Data and Metadata on the Web.

While this work provides a useful reference point, there are a number of differences between their requirements and ours:

  1. their JSON metadata is all there is (the only place for names, types, labels), whereas we have a taxonomy
  2. they want to cope with CSV files as they exist today in the world, whereas we're willing to impose a number of constraints
  3. they are interested in mapping CSV files to RDF; we have no such requirement

Review of the initial CSVW-based draft of xBRL-CSV (2017-05-02) found that the requirement to conform to the CSVW model was introducing complexity and limiting our design freedom, without offering sufficient compensating benefits.

Accordingly, we will not require xBRL-CSV to be built on this existing work. Instead, we will focus on sharing as much as possible with the xBRL-JSON specification (Section 3.7).

5.11 Support for CSV dialects (and other simple, delimited formats)

Different variants of simple, delimited text formats exist, varying in details such as:

xBRL-CSV does not need to support multiple formats. Instead, it should support a single format that is compatible with the CSV format used by commonly used applications and libraries.

5.12 Metadata in CSV files

CSV files sometimes include metadata in rows preceding the first row of data (or the header row).

For example:

"Data Source","World Development Indicators"
"Last Updated Date","2019-04-24"
"Country Name","Country Code","Indicator Name","Indicator Code","1960","1961","1962"
"United Kingdom","GBR","Trade (% of GDP)","NE.TRD.GNFS.ZS","41.8","40.1","39.0"

In this case, the third row of the table is the header row.

Support for this type of metadata is not required, as it can be adequately captured as taxonomy information, or in separate CSV files within the same report.

5.13 Multiple data point identifiers per fact or per row

Section 4.5.7 describes a requirement for groups of dimension values to be referred to by shorthand data point identifiers. It is not necessary to be able to combine multiple such identifiers when specifying the dimensions for a fact. It is acceptable for any dimensions that are not defined by the data point identifier to be reported explicitly.

Similarly, it is not necessary to support multiple facts using different data point identifiers to be reported on the same row.