The trend in regulatory reporting, and data collection generally, has been towards ever-more granular data. xBRL-CSV can efficiently represent huge volumes of data that are associated with granular reporting. That being said, xBRL-CSV is only useful if the correctness of the data can be verified in a reasonable amount of time. XBRL Formula is currently used for many of the checks needed to determine if the data is indeed correct. This document describes a couple of use-cases where running a Formula validation rule operating on an open table is providing significant performance and memory consumption issues in XBRL processing software.
A new specification is needed to address the problems described in the use-cases. It does not solve other performance related issues.
Using this new specification to address these problems requires updates to taxonomies (replacing inefficient XBRL Formula rules) and new software to run the new constraints. That impacts both taxonomy authors and report creators.
Many reports contain constraints on the presence of specific facts.
For closed tables, where only a fixed set of datapoints may be reported, this is easy to check as a processor just needs to find the fact (or not) using filters on concepts, dimensions and domain members.
For open tables (see an example below), where sets of datapoints are repeated using a typed dimension, such constraints are typically co-constraints: facts are required to be present based on the presence of other facts with the same values for certain dimensions. When implemented using XBRL Formula, such checks can be very inefficient when operating on large documents.
Liability ID | Liability Counterparty | Nominal Amount | Remaining duration | Interest rate | ... |
---|---|---|---|---|---|
C0010 | C0020 | C0030 | C0040 | C0050 | ... |
1 | LEICODE111 | 1000.00 | P3Y2M1D | 4.1 | ... |
2 | LEICODE222 | 15000.00 | P19Y | 3.5 | ... |
... | |||||
12345678 | LEICODEABC | 999.00 | P2M2D | 7.9 | ... |
For example: data is collected on liabilities of a commercial bank. For each liability 12 facts must be reported and 13 are optional. All facts share a liability id (typed dimension).
Currently, this type of constraint has to be enforced using XBRL Formula rules. This has a number of drawbacks:
Tables may contain values that reference values in other tables within the report. For example, JC 2023 85 Final report on draft ITS on Register of Information describes the data that must be provided using 15 datasets ("tables") and the relationships between these 15 datasets. To illustrate, each assessment reported must contain the id of the contractual arrangement it is linked to.
Currently these kind of checks are implemented through XBRL Formula and can be too resource intensive on large reports to run.
Some data models include datasets describes commonly as super- and subtypes. Records in subtype dataset provides additional information to records present in the supertype dataset. For example, customers can be either a person or a company. Customer would then be the supertype, Person and Company would be the subtypes. Properties relevant for both Person and Company (e.g. name) would be placed in Customer. Properties only relevant for Persons (e.g. birthdate) would be placed in Person. Similarly a property like VAT number would be placed in Company.
Customer, Person and Company would have the same primary key (e.g. id). Every id in Customer must exist in either Person or Company, but may not appear in both.
Implementing such a check with XBRL Formula is complex and execution of it will be very resource intensive on large reports.
It should be possible to include additional metadata for an xBRL-CSV report alongside the taxonomy DTS that specifies additional constraints that are to be applied to data in the report. The required additional constraints are described below.
The metadata must be included in a manner that is backwards-compatible with xBRL-CSV 1.0, such that a conformant xBRL-CSV 1.0 processor that is unaware of the additional metadata will still correctly process the report. This to ensure that existing receivers/producers of xBRL-CSV can keep using their existing applications if they don't need the Table Constraints features.
It should be possible to enforce that XBRL facts sharing a common value for a dimension, or set of dimensions, appear on a single row within an xBRL-CSV table. This requires the ability to bind dimensions to specific columns, and to enforce uniqueness constraints on a column, or set of columns as a "key".
It should be possible to constrain the type for a column to:
It should be possible to specify that a column must have a value in every row within a table.
It should be possible to specify that a column value takes one of a specified set of allowed values.
It should be possible to constrain a column value to be non-nil.
Where a column contains a value for the Period core dimension, it should be possible to constraint its duration be an instant, or a specified duration, including calendar durations such as "year", "month" and "week".
Where a column contains a date or datetime value, it should be possible to constrain the timezone component to be:
It should be possible to constrain a column value using a regular expression.
It should be possible to define a primary key for a table based on one or more columns.
It should be possible to define one or more unique keys for a table based on one or more columns from the table.
It should be possible to define one or more referential keys that constrain the values for the columns in a table to values found in another table.
It should be possible to define that a primary key takes its values from the primary key of one or more other tables. This to support super-/sub-typing.
A processor should check that the Table Constraints metadata added to the xBRL-CSV metadata is consistent with definitions in the taxonomy: errors should be raised when inconsistencies are discovered. For example, the allowed values listed as a Table Constraint must be part of an enumeration defined in the taxonomy and that the data types are consistent with element definitions in the taxonomy.
It must be possible to validate a report against the table constraints defined above without reference to the XBRL taxonomy for the report. This requirement is intended to enable a lightweight, easily parallelisable ingestion process prior to applying full XBRL validation on the resulting model. Avoiding a dependency on the taxonomy is intended to reduce start-up time and memory overhead, supporting on-demand parallelisation of this process.
The following requirements were considered, and deemed out of scope of the initial version of the solution.
A mechanism for specifying XBRL Formula rules (or other external rules) that can be applied using only the data in the current table or row.
Optional, additional constraints on identifiers used for xBRL-CSV tables, parameters and columns, to facilitate representation in other systems. For example, requiring that identfiers remain unique if case is ignored, or restricting the allowed characters.
XPath (or other) expressions that can be applied to the set of values in a row.
The ability to obtain the result of applying some aggregation function across all rows in a table. This would allow, for example, the efficient calculation of column totals.