xBRL-JSON design 1.0

Working Group Note 4 August 2021

This version
https://www.xbrl.org/WGN/xbrl-json-design/WGN-2021-08-04/xbrl-json-design-2021-08-04.html
Editor
Paul Warren, XBRL International Inc. <pdw@xbrl.org>
Contributor
Mark Goodhand, CoreFiling <mrg@corefiling.com>

Table of Contents

1 Overview

This document serves to document the motivation behind certain design decisions made in xBRL-JSON.

2 Number representation

The specification for numbers in JSON (RFC 8259 section 6) sets the expectation that this type is represented by a double-precision, floating point number.

XBRL implementations typically make heavy use of the xs:decimal datatype, as this is what xbrli:monetaryItemType is derived from. xs:decimal is an arbitrary precision decimal number (although implementations are only required to support a minimum of 18 digits).

Representing XBRL numeric values as JSON numbers would result in values being approximated using floating point numbers, resulting in a loss of precision. For example, the number 0.1 cannot be represented exactly as a base 2 floating point number. A simple example of the problems that this can cause is shown below. The sample uses Python, but any language that uses floating point numbers for JSON numbers will see similar issues.

>>> x = json.loads('{"values": [0.1, 0.2], "total": 0.3}')
>>> sum(x["values"]) == x["total"]
False
>>> sum(x["values"])
0.30000000000000004

The xBRL-JSON specificiation therefore represents numbers using a string representation which includes a valid lexical representation of according to the relevant XML Schema datatype (see Section 3).

3 Use of the XML Schema datatype system

This is described in more detail in the [OIM design document][oim-design].

In designing xBRL-JSON, the Working Group noted that other, JSON-based technologies re-use the XML Schema datatype system (XSD). For example, the JSON-LD specification includes many examples that make use of XSD.

4 Fact object vs fact array

Earlier drafts of xBRL-JSON included facts an array of objects. More recent drafts use an object with fact IDs as keys. The current approach was deemed preferable because:

The barrier to adopting this approach initially was the fact that ID attributes are optional on facts in XBRL v2.1, and so an ID may need to be generated when converting from XML.

It was deemed that fact IDs are sufficiently useful for traceability of facts through different representations that they should be made mandatory in the model, and generated in a consistent manner if required.

4.1 Converting fact object to fact array

It has been noted that some tools with generic JSON support do not work with "records" that are members of an object rather than an array. It was felt that there will always be limits to what can be achieved in generic JSON tools when analysing xBRL-JSON, and as such, the advantages of using an object outweigh the benefits of improving support for such tools "out of the box".

A fact object can trivially be converted into a fact array. For example, this can be done using the open source jq utility using the following command:

jq '.facts = (.facts | to_entries | map_values(.value + {id: .key}))'

5 Representation of dimensions

The dimensions associated with a fact are contained within a dimensions object. For example:

 "f923": {
    "value": "1234",
    "decimals": 0,
    "dimensions": {
        "concept": "tax:NumericConcept",
        "entity": "cid:123456789",
        "period": "2015-01-01T00:00:00/2016-01-01T00:00:00",
        "unit": "iso4217:GBP",
        "my:region": "my:UK"
    }
  }

The group also considered a simpler, flatter structure, without an additional container:

 "f923": {
    "value": "1234",
    "decimals": 0,
    "concept": "tax:NumericConcept",
    "entity": "cid:123456789",
    "period": "2015-01-01T00:00:00/2016-01-01T00:00:00",
    "unit": "iso4217:GBP",
    "my:region": "my:UK"
  }

The use of the additional container was chosen because:

  1. It allows extensions to the xBRL-JSON format to include additional information on facts. Without the additional container, any additional keys would be treated as dimensions. This is why the extensibility section permits additional keys on fact objects, but not on dimensions objects.
  2. There is a logical distinction between dimensions and other properties of a fact; if two facts have the same dimensions, they are considered duplicates.
  3. It follows the OIM more closely.

6 QNames vs Clark names vs URI notation

QNames provide a compact representation of Expanded Names (a namespace URI and local name pair), but they add significant complexity:

The Working Group considered and experimented with using an expanded format instead of QNames, either "Clark notation" of "{uri}localname" or representing as a single URI, e.g. "uri#localname".

Both of these options make processing significantly simpler, but at the cost of having significantly larger documents, and with a significant loss of readability. Even simple examples become extremely cumbersome if an expanded form is used throughout.

xBRL-JSON therefore retains the use of QNames, but uses a more restrictive format of namespace bindings, that have the benefit of being able to reliably determine QName equality within a document using a simple string comparison (i.e. if they have the same prefix, they have the same namespace).

6.1 Reserved prefixes

The specification reserves a number of prefixes that, if used, must be bound to a specified URI. These include the xbrli and iso4217 prefixes. This allows consuming applications to identify QNames within these namespaces without doing a namespace lookup. For example, if an application wishes to use the Euro symbol (€) to represent Euro units, this can be done by finding units with "iso4217:EUR" as the measure.

6.2 Use of QNames as object keys

One of the downsides of using QNames is that ":" is not a permitted character within identifiers in Javascript (or most other languages). This means that it is not possible to use the more conscise notation to directly access object properties, e.g.:

var geog = fact.dimensions.Geography;

Instead, you must use:

var geog = fact.dimensions['eg:Geography'];

xBRL-JSON uses unprefixed names for built-in dimensions (e.g. "concept"), and prefixed names for taxonomy-defined dimensions (see Section 13). It is felt as this provides a reasonable compromise, as it will generally only be built in dimensions that users will wish to hard-code. e.g.:

var concept = fact.dimensions.concept;
var period = fact.dimensions.period;

Taxonomy-defined dimensions will need to be handled generically, rather than selected with a literal QName:

for dim in Object.keys(fact.dimensions) {
    var dimVal = fact.dimensions[dim];
}

7 Period representation - durations

The period dimension can be either an instant or a duration. In the case of a duration, the Working Group considered modelling the period dimension as a pair of properties (start and end), as a single property with a string value, and as a single property with an object value containing start and end members.

A single property was deemed most appropriate, but raises the question of how to specify the value. ISO8601 defines a format for expressing time intervals, but this is very poorly supported by in common languages, and provides significant flexibility in format. For example, a year starting at 2001-01-01 can be expressed as any of:

2001-01-01T00:00:00/2002-01-01T00:00:00
2001-01-01T00:00:00/P1Y
P1Y/2002-01-01T00:00:00

Given the lack of support for this format in common libraries, it was felt that allowing this full flexibility placed an unreasonable burden on implementers of consuming software, and so instead a restricted version permitting only the first format has been adopted. This can be split into a pair of ISO8601 date time values with a simple string split on the / character.

Using a string value was deemed preferable to an object with separate components for consistency with all other dimensions.

8 Period representation - time component

XBRL v2.1 permits the time component of date times to be omitted, but applies a different interpretation depending on the context in which it is used:

This allows durations to be expressed in the natural, inclusive form. For example, we would typically describe 2001 as "1st Jan 2001 to 31st December 2001", but has lead to a significant number of off-by-one bugs in software.

The group considered:

  1. Maintaining the XBRL v2.1 semantics,
  2. Always intepretting a missing time component as 00:00:00 on that day, and
  3. Making the time component mandatory

The group was keen to avoid repeating the confusion caused by (1), but felt that switching to (2) would cause further confusion due to the differences to v2.1. Although somewhat more verbose, (3) is the easiest for developers to work with, as it uses standard ISO8601 date times, and requires no special processing at all.

9 Representation of Extensible Enumeration values

The Extensible Enumerations 2.0 specification permits the definition of concepts that take as their value a set of expanded names. Various approaches were considered for the xBRL-JSON representation of such facts:

  1. A JSON list of QName values.
  2. A space-separated list of QNames.
  3. A space-separated list of Clark names or expanded name URIs.

Approach (2) was chosen over (1) because it means that all fact values share the same JSON type of "String". This makes parsing and simple value comparison easier, particularly for processors that do not have type information from a taxonomy available.

The approach taken is consistent with the representation of duration periods (see Section 7).

Whilst a JSON list representation may be easier to work with once parsed, Extensible Enumerations are not widely used, and using list representation in xBRL-JSON risks simple software throwing type-errors when encountering enumeration values. For operations that do need to work with the values as a list, conversion from a space-separated string to a list is trivial in all common programming languages.

Approach (2) was chosen over (3) for consistency with use of QNames elsewhere (see Section 6).

10 Canonicalisation

The Working Group discussed the extent to which xBRL-JSON documents should be required to be canonicalised. The OIM uses the XML Schema datatyping system (see Section 3), which provides alternative lexical representations of the same value. For example, 1, 01.0 and +1.0 are all alternative representations of the same value.

Similar flexibility occurs from the use of QNames (see Section 6), as the use of a different prefix for a namespace does not alter the semantics of a QName values.

Canonicalisation can make comparison of documents and values much easier, as semantic equivalence can be established by simple string comparison. If it is known that values have been canonicalised, you do not need to refer to their datatypes in order to determine equivalence. Such comparison is particularly important for conformance suite tests.

The group discussed the possibility of canonicalising prefixes by taking a hash of the namespace URI. For example, http://fasb.org/us-gaap/2019-01-31 can be represented using a prefix derived from its SHA-256 hash: ns-10bea9499cc09aad9b97d384615ad9248599453acf8f5c014fefe91ed622cc72. However, the use of such verbose and opaque strings undermines the benefits of using prefixes in the first place (as noted in Section 6). Shorter hash-based strings such as ns-10bea can function in a restricted environment, but risk collisions across the full space of possible URIs, and they remain opaque. Accordingly, we leave document authors free to choose concise, meaningful prefixes that uniquely identify namespaces within the scope of a report.

The group considered requiring canonical lexical values (as per XML Schema) for facts and dimensions, but concluded that it did not provide sufficient value on its own to warrant the additional burden placed on document creators, as even if required, comparing documents requires awareness of QName values.

Instead, xBRL-JSON provides an optional "canonical values" feature, which prescribes a standard approach for the canonicalisation of values, and allows documents to declare when this has been used.

11 Links

The OIM supports links between pairs of facts. In xBRL-JSON, such links could be represented as a property of either the source or target fact, or entirely separately, referencing both source and target facts by ID.

The former gives an obvious asymmetry in representation, making it easier to traverse the relationship in one direction than the other. It should be noted that relationships are asymmetric in the model, in that the outgoing relationships from a fact are ordered, but incoming relationships are not. Grouping relationships by source fact allows this ordering to be captured concisely by representing the relationships for a fact as an ordered list.

Links are used to model footnotes, as well as other relationships, and it was felt that making it easy to get from a fact to its associated footnote(s) was a benefit. It was felt that placing link information on facts gave a more intuitive format, than a separate block of JSON containing objects where both keys and values are IDs.

Consumers requiring more advanced, bi-directional traversal of relationships would not be unduly inconvenienced by representing link information on facts.

12 Absent properties vs empty objects/lists

In a number of places in the syntax, JSON objects may have properties whose values are a collection of zero or more items. For example the facts property on a report has a value which is an object containing zero or more facts.

In this case, an empty collection could be represented by omitting the property altogether, rather than requiring it to be present with no members. This is more concise, but requires a small amount of additional coding in most languages in order to cope with the possibility of an absent property.

Requiring all such properties to be present, even if the collection is empty, would impose an unnecessary overhead on document size. For example, many XBRL documents contain few, if any, links, but this property can be present on any fact in this document. Requiring it to be present would potentially add significantly to the size of the document.

It was felt that the benefit of having a single, consistent approach of allowing all such properties to be omitted outweighed the cost of a small amount of additional code needed to cope with the possible omission of the property, even in cases where the inclusion would not represent a significant overhead.

13 Namespaces for built-in dimensions

A goal of the Open Information Model has been to provide greater consistency between XBRL v2.1 "built-in" dimensions, such as period, unit, and concept, and taxonomy defined dimension, as defined in Dimensions 1.0. To this end, all dimensions in xBRL-JSON are defined in a single JSON object called dimensions.

xBRL-JSON makes use of QNames, but has tried to remove some of the associated complexity of the XML specifications, such as scoped namespace bindings, default namespaces, and non-unique prefixes (i.e. multiple prefixes for the same namespace).

To this end, earlier drafts used a fixed prefix of xbrl for built-in dimensions, but it was felt that this added unnecessary verbosity to the format. Built-in dimensions are now represented using unprefixed names. This has the benefits that:

  1. They can be accessed directly as properties in Javascript (see Section 6.2)
  2. Built-in and taxonomy-defined dimensions can be easily differentiated by the presence of a : in the name.

14 Error codes when validating fact values

The xBRL-JSON specification states that processors "MAY" use the error code oime:invalidFactValue if they are performing full validation of fact values.

This intended to provide a standard error code for use when validating constraints in specifications that don't prescribe error codes (e.g. XML Schema, XBRL v2.1 and the Data Types Registry), while permitting processors to use any existing codes they may already have in place for those specifications.

By contrast, the Extensible Enumerations 2.0 specification and the Dimensions 1.0 specification do define error codes, and in the interests of interoperability we require that those codes are used.