KDL | January 2025 | |
Marchán & KDL Contributors | Experimental | [Page] |
KDL is a node-oriented document language. Its niche and purpose overlaps with XML, and as do many of its semantics. You can use KDL both as a configuration language, and a data exchange or storage format, if you so choose.¶
This is the formal specification for KDL, including the intended data model and the grammar.¶
This document describes KDL version KDL 2.0.0. It was released on 2024-12-21. It is the latest stable version of the language, and will only be edited for minor copyedits or major errata.¶
This note is to be removed before publishing as an RFC.¶
Status information for this document may be found at https://datatracker.ietf.org/doc/draft-marchan-kdl2/.¶
information can be found at https://kdl.dev/.¶
Source for this draft and an issue tracker can be found at https://github.com/kdl-org/kdl.¶
This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International. To view a copy of this license, visit https://creativecommons.org/licenses/by-sa/4.0/¶
KDL 2.0 is designed such that for any given KDL document written as KDL
1.0 or KDL 2.0, the parse will either fail completely, or, if the
parse succeeds, the data represented by a v1 or v2 parser will be identical.
This means that it's safe to use a fallback parsing strategy in order to support
both v1 and v2 simultaneously. For example, node "foo"
is a valid node in both
versions, and should be represented identically by parsers.¶
A version marker /- kdl-version 2
(or 1
) MAY be added to the beginning of
a KDL document, optionally preceded by the BOM, and parsers MAY use that as a
hint as to which version to parse the document as.¶
KDL is a node-oriented document language. Its niche and purpose overlaps with XML, and as do many of its semantics. You can use KDL both as a configuration language, and a data exchange or storage format, if you so choose.¶
The bulk of this document is dedicated to a long-form description of all Components (Section 3) of a KDL document. There is also a much more terse Grammar (Section 4) at the end of the document that covers most of the rules, with some semantic exceptions involving the data model.¶
KDL is designed to be easy to read and easy to implement.¶
In this document, references to "left" or "right" refer to directions in the data stream towards the beginning or end, respectively; in other words, the directions if the data stream were only ASCII text. They do not refer to the writing direction of text, which can flow in either direction, depending on the characters used.¶
The toplevel concept of KDL is a Document. A Document is composed of zero or more Nodes (Section 3.2), separated by newlines and whitespace, and eventually terminated by an EOF.¶
All KDL documents should be UTF-8 encoded and conform to the specifications in this document.¶
Being a node-oriented language means that the real core component of any KDL document is the "node". Every node must have a name, which must be a String (Section 3.9).¶
The name may be preceded by a Type Annotation (Section 3.8) to further
clarify its type, particularly in relation to its parent node. (For example,
clarifying that a particular date
child node is for the publication date,
rather than the last-modified date, with (published)date
.)¶
Following the name are zero or more Arguments (Section 3.5) or Properties (Section 3.4), separated by either whitespace (Section 3.17) or a slash-escaped line continuation (Section 3.3). Arguments and Properties may be interspersed in any order, much like is common with positional arguments vs options in command line tools. Collectively, Arguments and Properties may be referred to as "Entries".¶
Children (Section 3.6) can be placed after the name and the optional Entries, possibly separated by either whitespace or a slash-escaped line continuation.¶
Arguments are ordered relative to each other and that order must be preserved in order to maintain the semantics. Properties between Arguments do not affect Argument ordering.¶
By contrast, Properties SHOULD NOT be assumed to be presented in a given order. Children (Section 3.6) should be used if an order-sensitive key/value data structure must be represented in KDL. Cf. JSON objects preserving key order.¶
Nodes MAY be prefixed with Slashdash (Section 3.17.3) to "comment out" the entire node, including its properties, arguments, and children, and make it act as plain whitespace, even if it spreads across multiple lines.¶
Finally, a node is terminated by either a Newline (Section 3.18), a semicolon
(;
), the end of a child block (}
) or the end of the file/stream (an EOF
).¶
Line continuations allow Nodes (Section 3.2) to be spread across multiple lines.¶
A line continuation is a \
character followed by zero or more whitespace
items (including multiline comments) and an optional single-line comment. It
must be terminated by a Newline (Section 3.18) (including the Newline that is
part of single-line comments).¶
Following a line continuation, processing of a Node can continue as usual.¶
A Property is a key/value pair attached to a Node (Section 3.2). A Property is
composed of a String (Section 3.9), followed immediately by an equals sign (=
, U+003D
),
and then a Value (Section 3.7).¶
Properties should be interpreted left-to-right, with rightmost properties with identical names overriding earlier properties. That is:¶
node a=1 a=2¶
In this example, the node's a
value must be 2
, not 1
.¶
No other guarantees about order should be expected by implementers. Deserialized representations may iterate over properties in any order and still be spec-compliant.¶
Properties MAY be prefixed with /-
to "comment out" the entire token and
make it act as plain whitespace, even if it spreads across multiple lines.¶
An Argument is a bare Value (Section 3.7) attached to a Node (Section 3.2), with no associated key. It shares the same space as Properties (Section 3.4), and may be interleaved with them.¶
A Node may have any number of Arguments, which should be evaluated left to right. KDL implementations MUST preserve the order of Arguments relative to each other (not counting Properties).¶
Arguments MAY be prefixed with /-
to "comment out" the entire token and
make it act as plain whitespace, even if it spreads across multiple lines.¶
A children block is a block of Nodes (Section 3.2), surrounded by {
and }
. They
are an optional part of nodes, and create a hierarchy of KDL nodes.¶
Regular node termination rules apply, which means multiple nodes can be
included in a single-line children block, as long as they're all terminated by
;
.¶
A value is either: a String (Section 3.9), a Number (Section 3.14), a Boolean (Section 3.15), or Null (Section 3.16).¶
Values MUST be either Arguments (Section 3.5) or values of Properties (Section 3.4). Only String (Section 3.9) values may be used as Node (Section 3.2) names or Property (Section 3.4) keys.¶
Values (both as arguments and in properties) MAY be prefixed by a single Type Annotation (Section 3.8).¶
A type annotation is a prefix to any Node Name (Section 3.2) or Value (Section 3.7) that includes a suggestion of what type the value is intended to be treated as, or as a context-specific elaboration of the more generic type the node name indicates.¶
Type annotations are written as a set of (
and )
with a single
String (Section 3.9) in it. It may contain Whitespace after the (
and before
the )
, and may be separated from its target by Whitespace.¶
KDL does not specify any restrictions on what implementations might do with these annotations. They are free to ignore them, or use them to make decisions about how to interpret a value.¶
Additionally, the following type annotations MAY be recognized by KDL parsers and, if used, SHOULD interpret these types as follows:¶
Signed integers of various sizes (the number is the bit size):¶
Unsigned integers of various sizes (the number is the bit size):¶
Platform-dependent integer types, both signed and unsigned:¶
IEEE 754 floating point numbers, both single (32) and double (64) precision:¶
IEEE 754-2008 decimal floating point numbers¶
date-time
: ISO8601 date/time format.¶
time
: "Time" section of ISO8601.¶
date
: "Date" section of ISO8601.¶
duration
: ISO8601 duration format.¶
decimal
: IEEE 754-2008 decimal string format.¶
currency
: ISO 4217 currency code.¶
country-2
: ISO 3166-1 alpha-2 country code.¶
country-3
: ISO 3166-1 alpha-3 country code.¶
country-subdivision
: ISO 3166-2 country subdivision code.¶
email
: RFC5322 email address.¶
idn-email
: RFC6531 internationalized email address.¶
hostname
: RFC1123 internet hostname (only ASCII segments)¶
idn-hostname
: RFC5890 internationalized internet hostname
(only xn--
-prefixed ASCII "punycode" segments, or non-ASCII segments)¶
ipv4
: RFC2673 dotted-quad IPv4 address.¶
ipv6
: RFC2373 IPv6 address.¶
url
: RFC3986 URI.¶
url-reference
: RFC3986 URI Reference.¶
irl
: RFC3987 Internationalized Resource Identifier.¶
irl-reference
: RFC3987 Internationalized Resource Identifier Reference.¶
url-template
: RFC6570 URI Template.¶
uuid
: RFC4122 UUID.¶
regex
: Regular expression. Specific patterns may be implementation-dependent.¶
base64
: A Base64-encoded string, denoting arbitrary binary data.¶
Strings in KDL represent textual UTF-8 Values (Section 3.7). A String is either an
Identifier String (Section 3.10) (like foo
), a
Quoted String (Section 3.11) (like "foo"
)
or a Multi-Line String (Section 3.12).
Both Quoted and Multiline strings come in normal
and Raw String (Section 3.13) variants (like #"foo"#
):¶
Identifier Strings let you write short, "single-word" strings with a minimum of syntax¶
Quoted Strings let you write strings "like normal", with whitespace and escapes.¶
Multi-Line Strings let you write strings across multiple lines and with indentation that's not part of the string value.¶
Raw Strings don't allow any escapes, allowing you to not worry about the string's content containing anything that might look like an escape.¶
Strings MUST be represented as UTF-8 values.¶
Strings MUST NOT include the code points for
disallowed literal code points (Section 3.19) directly.
Quoted and Multi-Line Strings may include these code points as values
by representing them with their corresponding \u{...}
escape.¶
An Identifier String (sometimes referred to as just an "identifier") is composed of any Unicode Scalar Value other than non-initial characters (Section 3.10.1), followed by any number of Unicode Scalar Values other than non-identifier characters (Section 3.10.2).¶
A handful of patterns are disallowed, to avoid confusion with other values:¶
idents that appear to start with a Number (Section 3.14) (like 1.0v2
or
-1em
) or the "almost a number" pattern of a decimal point without a
leading digit (like .1
).¶
idents that are the language keywords (inf
, -inf
, nan
, true
,
false
, and null
) without their leading #
.¶
Identifiers that match these patterns MUST be treated as a syntax error; such values can only be written as quoted or raw strings. The precise details of the identifier syntax is specified in the Full Grammar in Section 4.¶
The following characters cannot be the first character in an Identifier String (Section 3.10):¶
Any decimal digit (0-9)¶
Any non-identifier characters (Section 3.10.2)¶
Additionally, the following initial characters impose limitations on subsequent characters:¶
the +
and -
characters can only be used as an initial character if
the second character is not a digit. If the second character is .
, then
the third character must not be a digit.¶
the .
character can only be used as an initial character if
the second character is not a digit.¶
This allows identifiers to look like --this
or .md
, and removes the
ambiguity of having an identifier look like a number.¶
The following characters cannot be used anywhere in a Identifier String (Section 3.10):¶
Any of (){}[]/\"#;=
¶
Any Whitespace (Section 3.17) or Newline (Section 3.18).¶
Any disallowed literal code points (Section 3.19) in KDL documents.¶
A Quoted String is delimited by "
on either side of any number of literal
string characters except unescaped "
and \
.¶
Literal Newline (Section 3.18) characters can only be included
if they are Escaped Whitespace (Section 3.11.1.1),
which discards them from the string value.
Actually including a newline in the value requires using a newline escape sequence,
like \n
,
or using a Multi-Line String (Section 3.12)
which is actually designed for strings stretching across multiple lines.¶
Like Identifier Strings, Quoted Strings MUST NOT include any of the disallowed literal code-points (Section 3.19) as code points in their body.¶
Quoted Strings have a Raw String (Section 3.13) variant, which disallows escapes.¶
In addition to literal code points, a number of "escapes" are supported in Quoted Strings.
"Escapes" are the character \
followed by another character, and are
interpreted as described in the following table:¶
Name | Escape | Code Pt |
---|---|---|
Line Feed |
\n
|
U+000A
|
Carriage Return |
\r
|
U+000D
|
Character Tabulation (Tab) |
\t
|
U+0009
|
Reverse Solidus (Backslash) |
\\
|
U+005C
|
Quotation Mark (Double Quote) |
\"
|
U+0022
|
Backspace |
\b
|
U+0008
|
Form Feed |
\f
|
U+000C
|
Space |
\s
|
U+0020
|
Unicode Escape |
\u{(1-6 hex chars)}
|
Code point described by hex characters, as long as it represents a Unicode Scalar Value |
Whitespace Escape | See below | N/A |
In addition to escaping individual characters, \
can also escape whitespace.
When a \
is followed by one or more literal whitespace characters, the \
and all of that whitespace are discarded. For example,¶
"Hello World"¶
and¶
"Hello \ World"¶
are semantically identical. See whitespace (Section 3.17) and newlines (Section 3.18) for how whitespace is defined.¶
Note that only literal whitespace is escaped; whitespace escapes (\n
and
such) are retained. For example, these strings are all semantically identical:¶
"Hello\ \nWorld" "Hello\n\ World" "Hello\nWorld" """ Hello World """¶
Except as described in the escapes table, above, \
MUST NOT precede any
other characters in a string.¶
Multi-Line Strings support multiple lines with literal, non-escaped Newlines. They must use a special multi-line syntax, and they automatically "dedent" the string, allowing its value to be indented to a visually matching level as desired.¶
A Multi-Line String is opened and closed by three double-quote characters,
like """
.
Its first line MUST immediately start with a Newline (Section 3.18)
after its opening """
.
Its final line MUST contain only whitespace
before the closing """
.
All in-between lines that contain non-newline, non-whitespace characters
MUST start with at least the exact same whitespace as the final line
(precisely matching codepoints, not merely counting characters or "size");
they may contain additional whitespace following this prefix. The lines in
between may contain unescaped "
(but no unescaped """
as this would close
the string).¶
The value of the Multi-Line String omits the first and last Newline, the Whitespace of the last line, and the matching Whitespace prefix on all intermediate lines. The first and last Newline can be the same character (that is, empty multi-line strings are legal).¶
In other words, the final line specifies the whitespace prefix that will be removed from all other lines.¶
Multi-line Strings that do not immediately start with a Newline and whose final
"""
is not preceeded by optional whitespace and a Newline are illegal. This
also means that """
may not be used for a single-line String (e.g.
"""foo"""
).¶
Literal Newline sequences in Multi-line Strings must be normalized to a single
U+000A
(LF
) during deserialization. This means, for example, that CR LF
becomes a single LF
during parsing.¶
This normalization does not apply to non-literal Newlines entered using escape sequences. That is:¶
multi-line """ \r\n[CRLF] foo[CRLF] """¶
becomes:¶
single-line "\r\n\nfoo"¶
For clarity: this normalization applies to each individual Newline sequence.
That is, the literal sequence CRLF CRLF
becomes LF LF
, not LF
.¶
multi-line """ foo This is the base indentation bar """¶
This example's string value will be:¶
foo This is the base indentation bar¶
which is equivalent to¶
" foo\nThis is the base indentation\n bar"¶
when written as a single-line string.¶
If the last line wasn't indented as far, it won't dedent the rest of the lines as much:¶
multi-line """ foo This is no longer on the left edge bar """¶
This example's string value will be:¶
foo This is no longer on the left edge bar¶
Equivalent to¶
" foo\n This is no longer on the left edge\n bar"¶
Empty lines can contain any whitespace, or none at all, and will be reflected as empty in the value:¶
multi-line """ Indented a bit A second indented paragraph. """¶
This example's string value will be:¶
Indented a bit. A second indented paragraph.¶
Equivalent to¶
"Indented a bit.\n\nA second indented paragraph."¶
The following yield syntax errors:¶
multi-line """can't be single line"""¶
multi-line """ closing quote with non-whitespace prefix"""¶
multi-line """stuff """¶
// Every line must share the exact same prefix as the closing line. multi-line """[\n] [tab]a[\n] [space][space]b[\n] [space][tab][\n] [tab]"""¶
Multi-line strings support the same mechanism for escaping whitespace as Quoted Strings.¶
When processing a Multi-line String, implementations MUST dedent the string
after resolving all whitespace escapes, but before resolving other backslash
escapes. This means a whitespace escape that attempts to escape the final line's
newline and/or whitespace prefix can be invalid: if removing escaped whitespace
places the closing """
on a line with non-whitespace characters, this escape
is invalid.¶
For example, the following example is illegal:¶
""" foo bar\ """ // equivalent to """ foo bar"""¶
while the following example is allowed¶
""" foo \ bar baz \ """ // equivalent to """ foo bar baz """¶
Both Quoted (Section 3.11) and Multi-Line Strings (Section 3.12) have
Raw String variants, which are identical in syntax except they do not support
\
-escapes. This includes line-continuation escapes (\
+ ws
collapsing to
nothing). They otherwise share the same properties as far as literal
Newline (Section 3.18) characters go, multi-line rules, and the requirement of
UTF-8 representation.¶
The Raw String variants are indicated by preceding the strings's opening quotes
with one or more #
characters. The string is then closed by its normal closing
quotes, followed by a matching number of #
characters. This means that the
string may contain any combination of "
and #
characters other than its
closing delimiter (e.g., if a raw string starts with ##"
, it can contain "
or "#
, but not "##
or "###
).¶
Like other Strings, Raw Strings MUST NOT include any of the disallowed literal code-points (Section 3.19) as code points in their body. Unlike with Quoted Strings, these cannot simply be escaped, and are thus unrepresentable when using Raw Strings.¶
just-escapes #"\n will be literal"#¶
The string contains the literal characters \n will be literal
.¶
quotes-and-escapes ##"hello\n\r\asd"#world"##¶
The string contains the literal characters hello\n\r\asd"#world
¶
raw-multi-line #""" Here's a """ multiline string """ without escapes. """#¶
The string contains the value¶
Here's a """ multiline string """ without escapes.¶
or equivalently,¶
"Here's a \"\"\"\n multiline string\n \"\"\"\nwithout escapes."¶
as a Quoted String.¶
Numbers in KDL represent numerical Values (Section 3.7). There is no logical distinction in KDL between real numbers, integers, and floating point numbers. It's up to individual implementations to determine how to represent KDL numbers.¶
There are five syntaxes for Numbers: Keywords, Decimal, Hexadecimal, Octal, and Binary.¶
All non-Keyword (Section 3.14.1) numbers may optionally start with one of -
or +
, which determine whether they'll be positive or negative.¶
Binary numbers start with 0b
and only allow 0
and 1
as digits, which may be separated by _
. They represent numbers in radix 2.¶
Octal numbers start with 0o
and only allow digits between 0
and 7
, which may be separated by _
. They represent numbers in radix 8.¶
Hexadecimal numbers start with 0x
and allow digits between 0
and 9
, as well as letters A
through F
, in either lower or upper case, which may be separated by _
. They represent numbers in radix 16.¶
Decimal numbers are a bit more special:¶
They have no radix prefix.¶
They use digits 0
through 9
, which may be separated by _
.¶
They may optionally include a decimal separator .
, followed by more digits, which may again be separated by _
.¶
They may optionally be followed by E
or e
, an optional -
or +
, and more digits, to represent an exponent value.¶
Note that, similar to JSON and some other languages,
numbers without an integer digit (such as .1
) are illegal.
They must be written with at least one integer digit, like 0.1
.
(These patterns are also disallowed from Identifier Strings (Section 3.10), to avoid confusion.)¶
There are three special "keyword" numbers included in KDL to accomodate the widespread use of IEEE 754 floats:¶
#inf
- floating point positive infinity.¶
#-inf
- floating point negative infinity.¶
#nan
- floating point NaN/Not a Number.¶
To go along with this and prevent foot guns, the bare Identifier
Strings (Section 3.10) inf
, -inf
, and nan
are considered illegal
identifiers and should yield a syntax error.¶
The existence of these keywords does not imply that any numbers be represented as IEEE 754 floats. These are simply for clarity and convenience for any implementation that chooses to represent their numbers in this way.¶
A boolean Value (Section 3.7) is either the symbol #true
or #false
. These
SHOULD be represented by implementation as boolean logical values, or some
approximation thereof.¶
The symbol #null
represents a null Value (Section 3.7). It's up to the
implementation to decide how to represent this, but it generally signals the
"absence" of a value.¶
The following characters should be treated as non-Newline (Section 3.18) white space:¶
Name | Code Pt |
---|---|
Character Tabulation |
U+0009
|
Space |
U+0020
|
No-Break Space |
U+00A0
|
Ogham Space Mark |
U+1680
|
En Quad |
U+2000
|
Em Quad |
U+2001
|
En Space |
U+2002
|
Em Space |
U+2003
|
Three-Per-Em Space |
U+2004
|
Four-Per-Em Space |
U+2005
|
Six-Per-Em Space |
U+2006
|
Figure Space |
U+2007
|
Punctuation Space |
U+2008
|
Thin Space |
U+2009
|
Hair Space |
U+200A
|
Narrow No-Break Space |
U+202F
|
Medium Mathematical Space |
U+205F
|
Ideographic Space |
U+3000
|
Any text after //
, until the next literal Newline (Section 3.18) is "commented
out", and is considered to be Whitespace (Section 3.17).¶
In addition to single-line comments using //
, comments can also be started
with /*
and ended with */
. These comments can span multiple lines. They
are allowed in all positions where Whitespace (Section 3.17) is allowed and
can be nested.¶
Finally, a special kind of comment called a "slashdash", denoted by /-
, can
be used to comment out entire components of a KDL document logically, and
have those elements not be included as part of the parsed document data.¶
Slashdash comments can be used before the following, including before their type annotations, if present:¶
A Node (Section 3.2): the entire Node is treated as Whitespace, including all props, args, and children.¶
An Argument (Section 3.5): the Argument value is treated as Whitespace.¶
A Property (Section 3.4) key: the entire property, including both key and value, is treated as Whitespace. A slashdash of just the property value is not allowed.¶
A Children Block (Section 3.6): the entire block, including all children within, is treated as Whitespace. Only other children blocks, whether slashdashed or not, may follow a slashdashed children block.¶
A slashdash may be be followed by any amount of whitespace, including newlines and comments (other than other slashdashes), before the element that it comments out.¶
The following character sequences should be treated as new lines:¶
Acronym | Name | Code Pt |
---|---|---|
CRLF | Carriage Return and Line Feed |
U+000D + U+000A
|
CR | Carriage Return |
U+000D
|
LF | Line Feed |
U+000A
|
NEL | Next Line |
U+0085
|
VT | Vertical tab |
U+000B
|
FF | Form Feed |
U+000C
|
LS | Line Separator |
U+2028
|
PS | Paragraph Separator |
U+2029
|
Note that for the purpose of new lines, the specific sequence CRLF
is
considered a single newline.¶
The following code points may not appear literally anywhere in the document.
They may be represented in Strings (but not Raw Strings) using Unicode Escapes (Section 3.11.1) (\u{...}
,
except for non Unicode Scalar Value, which can't be represented even as escapes).¶
The codepoints U+0000-0008
or the codepoints U+000E-001F
(various
control characters).¶
U+007F
(the Delete control character).¶
Any codepoint that is not a Unicode Scalar
Value (U+D800-DFFF
).¶
U+200E-200F
, U+202A-202E
, and U+2066-2069
, the unicode
"direction control"
characters¶
U+FEFF
, aka Zero-width Non-breaking Space (ZWNBSP)/Byte Order Mark (BOM),
except as the first code point in a document.¶
This is the full official grammar for KDL and should be considered authoritative if something seems to disagree with the text above. The grammar language syntax is defined in Section 4.1.¶
document := bom? version? nodes // Nodes nodes := (line-space* node)* line-space* base-node := slashdash? type? node-space* string (node-space+ slashdash? node-prop-or-arg)* // slashdashed node-children must always be after props and args. (node-space+ slashdash node-children)* (node-space+ node-children)? (node-space+ slashdash node-children)* node-space* node := base-node node-terminator final-node := base-node node-terminator? // Entries node-prop-or-arg := prop | value node-children := '{' nodes final-node? '}' node-terminator := single-line-comment | newline | ';' | eof prop := string node-space* '=' node-space* value value := type? node-space* (string | number | keyword) type := '(' node-space* string node-space* ')' // Strings string := identifier-string | quoted-string | raw-string ¶ identifier-string := unambiguous-ident | signed-ident | dotted-ident unambiguous-ident := ((identifier-char - digit - sign - '.') identifier-char*) - disallowed-keyword-strings signed-ident := sign ((identifier-char - digit - '.') identifier-char*)? dotted-ident := sign? '.' ((identifier-char - digit) identifier-char*)? identifier-char := unicode - unicode-space - newline - [\\/(){};\[\]"#=] - disallowed-literal-code-points disallowed-keyword-identifiers := 'true' | 'false' | 'null' | 'inf' | '-inf' | 'nan' quoted-string := '"' single-line-string-body '"' | '"""' newline (multi-line-string-body newline)? (unicode-space | ws-escape)* '"""' single-line-string-body := (string-character - newline)* multi-line-string-body := (('"' | '""')? string-character)* string-character := '\\' (["\\bfnrts] | 'u{' hex-unicode '}') | ws-escape | [^\\"] - disallowed-literal-code-points ws-escape := '\\' (unicode-space | newline)+ hex-digit := [0-9a-fA-F] hex-unicode := hex-digit{1, 6} - surrogates surrogates := [dD][8-9a-fA-F]hex-digit{2} // U+D800-DFFF: D 8 00 // D F FF raw-string := '#' raw-string-quotes '#' | '#' raw-string '#' raw-string-quotes := '"' single-line-raw-string-body '"' | '"""' newline (multi-line-raw-string-body newline)? unicode-space* '"""' single-line-raw-string-body := '' | (single-line-raw-string-char - '"') single-line-raw-string-char*? | '"' (single-line-raw-string-char - '"') single-line-raw-string-char*? single-line-raw-string-char := unicode - newline - disallowed-literal-code-points multi-line-raw-string-body := (unicode - disallowed-literal-code-points)*? // Numbers number := keyword-number | hex | octal | binary | decimal decimal := sign? integer ('.' integer)? exponent? exponent := ('e' | 'E') sign? integer integer := digit (digit | '_')* digit := [0-9] sign := '+' | '-' hex := sign? '0x' hex-digit (hex-digit | '_')* octal := sign? '0o' [0-7] [0-7_]* binary := sign? '0b' ('0' | '1') ('0' | '1' | '_')* // Keywords and booleans. keyword := boolean | '#null' keyword-number := '#inf' | '#-inf' | '#nan' boolean := '#true' | '#false' // Specific code points bom := '\u{FEFF}' disallowed-literal-code-points := See Table (Disallowed Literal Code Points) unicode := Any Unicode Scalar Value unicode-space := See Table (All White_Space unicode characters which are not `newline`) // Comments single-line-comment := '//' ^newline* (newline | eof) multi-line-comment := '/*' commented-block commented-block := '*/' | (multi-line-comment | '*' | '/' | [^*/]+) commented-block slashdash := '/-' line-space* // Whitespace ws := unicode-space | multi-line-comment escline := '\\' ws* (single-line-comment | newline | eof) newline := See Table (All Newline White_Space) // Whitespace where newlines are allowed. line-space := node-space | newline | single-line-comment // Whitespace within nodes, // where newline-ish things must be esclined. node-space := ws* escline ws* | ws+ // Version marker version := '/-' unicode-space* 'kdl-version' unicode-space+ ('1' | '2') unicode-space* newline¶
The grammar language syntax is a combination of ABNF with some regex spice thrown in. Specifically:¶
Single quotes ('
) are used to denote literal text. \
within a literal
string is used for escaping other single-quotes, for initiating unicode
characters using hex values (\u{FEFF}
), and for escaping \
itself
(\\
).¶
*
is used for "zero or more", +
is used for "one or more", and ?
is
used for "zero or one". Per standard regex semantics, *
and +
are greedy;
they match as many instances as possible without failing the match.¶
*?
(used only in raw strings) indicates a non-greedy match;
it matches as few instances as possible without failing the match.¶
¶
is a cut point. It always matches and consumes no characters,
but once matched, the parser is not allowed to backtrack past that point in the source.
If a parser would rewind past the cut point, it must instead fail the overall parse,
as if it had run out of options.
(This is only used with the raw-string
production,
to ensure the first instance of the appropriate closing quote sequence
is guaranteed to be the end of the raw string,
rather than allowing it to potentially consume more of the document unexpectedly.)¶
()
can be used to group matches that must be matched together.¶
a | b
means a or b
, whichever matches first. If multiple items are before
a |
, they are a single group. a b c | d
is equivalent to (a b c) | d
.¶
[]
are used for regex-style character matches, where any character between
the brackets will be a single match. \
is used to escape \
, [
, and
]
. They also support character ranges (0-9
), and negation (^
)¶
-
is used for "except for" or "minus" whatever follows it. For example,
a - 'x'
means "any a
, except something that matches the literal 'x'
".¶
The prefix ^
means "something that does not match" whatever follows it.
For example, ^foo
means "must not match foo
".¶
A single definition may be split over multiple lines. Newlines are treated as spaces.¶
//
followed by text on its own line is used as comment syntax.¶