The following specification defines a means of creating X·M·L documents which are useable with Kixt charsets.
The Kixt Transmission Format allows for the association of metadata with documents through the use of headers. However, using this format to specify character sets in a manner compatible with non‐Kixt‐aware X·M·L processors is impossible, due to its reliance on characters prohibited in X·M·L documents. This specification defines a subset of the X·M·L syntax which may be used to encode texts written in a Kixt charset. The resulting document is called a Kixt X·M·L document.
The file extension .xkixt
or .xkx
is suggested for saved Kixt X·M·L documents.
This document is part of the Kixt family of specifications. It is also built upon X·M·L and R·D·F technologies.
A Kixt Charset Definition is X·M·L‐compatible if it is U·T·F‐8 compatible and the objects of the compatibility properties are equal to those defined in https://spec.go.kibi.family/-/kixt-xml/charset for all characters so defined.
The Unicode character set, as well as the A·S·C·I·I subset thereof, is assumed to be X·M·L compatible. Whether other character sets are X·M·L compatible is left undefined by this specification.
The character set of a Kixt X·M·L document may be any X·M·L‐compatible character set. Kixt XML documents must not be fully normalized, as they do not necessarily contain Unicode contents.
Kixt X·M·L documents must be transmitted as either Generalized U·T·F‐8, Fullwidth‐B·E, or Fullwidth‐L·E.
Fullwidth‐B·E or Fullwidth‐L·E Kixt X·M·L documents must begin with the codepoint FEFF
.
Generalized U·T·F‐8 Kixt X·M·L documents may also begin with FEFF
, but this is not required.
Kixt X·M·L documents must follow the syntax defined by X·M·L, with the additional constraints:
Kixt X·M·L documents must not contain the codepoints 000A
, 000D
, 0085
, or 2028
.
As consequence of this rule, the only valid X·M·L <S>
whitespace in a Kixt X·M·L document is 0020
.
This does not prevent the presence of other, non‐syntactic whitespace, however.
Kixt X·M·L documents must not contain any codepoints not defined in https://spec.go.kibi.family/-/kixt-xml/charset in any X·M·L <Name>
, <NCName>
, <Nmtoken>
, or <PubidLiteral>
.
Kixt X·M·L documents must not contain an <EncodingDecl>
encoding declaration.
Kixt XML documents must not contain any codepoints not assigned in the current character set.
The starting character set for a Kixt X·M·L document is Unicode.
The character set for the contents of any X·M·L element can be changed by setting the attribute with local name charset
and namespace name https://spec.go.kibi.family/ns/kixt/
on that element.
If the value of this attribute is the I·R·I of a supported, X·M·L‐compatible character set, then this is the character set of the element’s contents.
Otherwise, the character set of the element’s contents is the same as that for its parent, or, in the case of the root element, the document as a whole.
Using non‐Unicode character sets within a document may make scripted Kixt X·M·L documents more difficult to sanitize. It is advised that processors of scripted documents which may contain unsafe information fail to recognize all character set I·R·Is, effectively locking the character set into Unicode, unless the character sets supported by a sanitization filter are known.
New U·R·L’s and minor revisions.
Added Security section.
Initial specification.