kibigo! :: Specifications

Kixt XML

Abstract

The following specification defines a means of creating X·M·L documents which are useable with Kixt charsets.

1. Introduction

1.1 Purpose and Scope

The Kixt Transmission Format allows for the association of metadata with documents through the use of headers. However, using this format to specify character sets in a manner compatible with non‐Kixt‐aware X·M·L processors is impossible, due to its reliance on characters prohibited in X·M·L documents. This specification defines a subset of the X·M·L syntax which may be used to encode texts written in a Kixt charset. The resulting document is called a Kixt X·M·L document.

The file extension .xkixt or .xkx is suggested for saved Kixt X·M·L documents.

1.2 Relationship to Other Specifications

This document is part of the Kixt family of specifications. It is also built upon X·M·L and R·D·F technologies.

2. Character sets

A Kixt Charset Definition is X·M·L‐compatible if it is U·T·F‐8 compatible and the objects of the compatibility properties are equal to those defined in https://spec.go.kibi.family/-/kixt-xml/charset for all characters so defined.

The Unicode character set, as well as the A·S·C·I·I subset thereof, is assumed to be X·M·L compatible. Whether other character sets are X·M·L compatible is left undefined by this specification.

The character set of a Kixt X·M·L document may be any X·M·L‐compatible character set. Kixt XML documents must not be fully normalized, as they do not necessarily contain Unicode contents.

2.1 Character encoding

Kixt X·M·L documents must be transmitted as either Generalized U·T·F‐8, Fullwidth‐B·E, or Fullwidth‐L·E. Fullwidth‐B·E or Fullwidth‐L·E Kixt X·M·L documents must begin with the codepoint FEFF. Generalized U·T·F‐8 Kixt X·M·L documents may also begin with FEFF, but this is not required.

3. The Format

3.1 Restrictions on the X·M·L syntax

Kixt X·M·L documents must follow the syntax defined by X·M·L, with the additional constraints:

  1. Kixt X·M·L documents must not contain the codepoints 000A, 000D, 0085, or 2028.

    As consequence of this rule, the only valid X·M·L <S> whitespace in a Kixt X·M·L document is 0020. This does not prevent the presence of other, non‐syntactic whitespace, however.

  2. Kixt X·M·L documents must not contain any codepoints not defined in https://spec.go.kibi.family/-/kixt-xml/charset in any X·M·L <Name>, <NCName>, <Nmtoken>, or <PubidLiteral>.

  3. Kixt X·M·L documents must not contain an <EncodingDecl> encoding declaration.

  4. Kixt XML documents must not contain any codepoints not assigned in the current character set.

3.2 Defining the character set

The starting character set for a Kixt X·M·L document is Unicode.

The character set for the contents of any X·M·L element can be changed by setting the attribute with local name charset and namespace name https://spec.go.kibi.family/ns/kixt/ on that element. If the value of this attribute is the I·R·I of a supported, X·M·L‐compatible character set, then this is the character set of the element’s contents. Otherwise, the character set of the element’s contents is the same as that for its parent, or, in the case of the root element, the document as a whole.

4. Security

Using non‐Unicode character sets within a document may make scripted Kixt X·M·L documents more difficult to sanitize. It is advised that processors of scripted documents which may contain unsafe information fail to recognize all character set I·R·Is, effectively locking the character set into Unicode, unless the character sets supported by a sanitization filter are known.

5. Changelog

New U·R·L’s and minor revisions.

Added Security section.

Initial specification.