The following specification defines a mechanism for defining character sets (charsets) which may be used in a Kixt transmission or rendering system. Kixt charsets are, as their name implies, sets of Kixt characters, mapping each one to a codepoint and assigning it particular character properties. This document details the meaning of these properties within the Kixt Charset Model. Finally, this document introduces a plaintext document format, the Kixt Charset Definition, for describing such charsets.
The purpose of Kixt is to reduce the barriers of entry for developing unconventional, experimental encodings and text processing systems, particularly those of characters or scripts which are unlikely to gain mainstream acceptance. The Kixt Charset Model, defined in this document, is one piece in an interlocking set of specifications working towards this goal.
The Kixt Charset Model is not, and does not attempt to be, a replacement for Unicode. In fact, every assigned Kixt character is required to have a defined Unicode mapping. If you are building an application which needs to process characters in a wide variety of scripts, languages, and/or directionalities, Unicode is the correct solution for you.
This document is part of the Kixt family of specifications. It is also built upon the technologies of R·D·F and O·W·L. It makes minor use of the Ordered List Ontology for defining its ordered lists.
In this document, the following prefixes are used to represent the following strings:
Prefix | Expansion |
---|---|
kixt: |
https://spec.go.kibi.family/ns/kixt/# |
i18n: |
https://www.w3.org/ns/i18n# |
olo: |
http://purl.org/ontology/olo/core# |
rdf: |
http://www.w3.org/1999/02/22-rdf-syntax-ns# |
xsd: |
http://www.w3.org/2001/XMLSchema# |
A Kixt character is an abstract representation of a unit of text, with associated properties and assignment to a single codepoint.
In Kixt, all codepoints are integer values between 0
and 65535
, inclusive.
A Kixt charset is a collection of Kixt characters.
Within a Kixt charset, characters may be collected into blocks. Blocks may have “gaps,” but must otherwise be contiguous across assigned characters; this is enforced by the Kixt Charset Definition syntax. Each character may belong to only one block.
Similarly, Kixt characters may be collected into scripts. Scripts are intended to allow for searching, accessing, and categorizing characters based on their histories or communities of usage. This specification makes no requirements on the structure of scripts, except to specify that each character may only belong to one.
The Kixt Charset Model is an R·D·F graph which associates charsets, characters, and their properties with one another according to the rules outlined in this document. The vocabulary for this model is the Kixt Ontology (O·W·L document), which normatively defines several classes and properties for use with the Model.
Although the Kixt Charset Model is defined using R·D·F and O·W·L, it is not expected that most or even many applications which make use of it will be fully‐fledged OWL reasoners. It is not presently recommended that applications interfacing with Kixt charsets load their information from anything other than a Kixt Charset Definition, whose resultant R·D·F graph is weldefined by this specification and may be abstracted as required. (However, its basis in R·D·F and O·W·L means that extensions to the Kixt Charset Model, as well as alternate means of loading and processing Kixt charsets, may conceivably be designed in the future.)
There are two major types of resource in the Kixt Charset Model: charsets and characters.
These are represented by the classes kixt:Charset
and kixt:Character
.
In addition to these, kixt:Block
and kixt:Script
represent blocks and scripts of characters, respectively.
Only kixt:Charset
need explicitly be declared as the rdf:type
of a resource; the remaining classes are implied through an instance's assignment to various properties (kixt:character
, kixt:block
, kixt:script
).
Many other classes of resource are defined in the Kixt Ontology. However, knowledge of their existence is not required for the processes described in this document.
As an R·D·F‐based model, the Kixt Charset Model naturally inherits the various datatypes introduced in R·D·F Concepts, including those inherited from X·S·D Datatypes and elsewhere. It also introduces five new datatypes:
kixt:BasicType
: One of the following xsd:anyURI
s:
https://spec.go.kibi.family/ns/kixt/#UNASSIGNED
https://spec.go.kibi.family/ns/kixt/#CONTROL
https://spec.go.kibi.family/ns/kixt/#MESSAGING
https://spec.go.kibi.family/ns/kixt/#FORMAT
https://spec.go.kibi.family/ns/kixt/#DATA
https://spec.go.kibi.family/ns/kixt/#NONSPACING
https://spec.go.kibi.family/ns/kixt/#SPACING
https://spec.go.kibi.family/ns/kixt/#PRIVATEUSE
https://spec.go.kibi.family/ns/kixt/#NONCHARACTER
kixt:Name
: An xsd:string
matching the <Name>
production below.
kixt:Scalar
: An xsd:unsignedInt
with a maxInclusive
restriction of 1114111
.
kixt:String
: An xsd:string
matching the <String>
production below.
kixt:Ternary
: An xsd:string
whose value is either YES
, NO
, or the empty string.
As O·W·L datatypes, you should not use these to type R·D·F literals; use xsd:anyURI
, xsd:string
, xsd:integer
or similar instead.
However, they are useful for expressing constraints on relations.
CharsetDefinition =
[%xFEFF] CharsetDeclaration *(
BlockDeclaration
/ ScriptDeclaration
/ CharacterDefinition
/ Comment
/ Blank
)
A Kixt Charset Definition is a U·T·F‐8 or U·T·F‐16–encoded file (determined by a leading B·O·M, and defaulting to U·T·F‐8 if no B·O·M is present) consisting of any number of block or script declarations, interspersed with any number of codepoint definitions, comments, or lines of whitespace. This section describes the syntax of such documents, in A·B·N·F and prose. It also describes how processors can use Kixt Charset Definitions to generate R·D·F graphs.
The file extension .kichar
or .kch
is suggested for Kixt Charset Definitions.
Programs which process Kixt Charset Definitions must ignore any U+0000 NULL
characters which appear in a document, behaving as though they were not present.
This effectively allows a U·T·F‐16‐encoded document to be processed without a B·O·M if it only contains codepoints in the A·S·C·I·I range.
Break = %x2028
; A Unicode line separator
; This is effectively just a placeholder for whatever manner of linebreak a document happens to use; see spec description
U+2028 LINE SEPARATOR
is the only formally recognized line separator in a Kixt Charset Definition.
However, to ease in the use of Kixt Charset Definitions on platforms for which support for U+2028 LINE SEPARATOR
is lacking, programs which process Kixt Charset Definitions must behave as though they normalized all of the following, on input and before parsing, to a single U+2028 LINE SEPARATOR
:
<U+000D, U+000A>
<U+000D, U+0085>
U+000A LINE FEED
U+0085 NEXT LINE
U+000D CARRIAGE RETURN
character that is not immediately followed by U+000A LINE FEED
or U+0085 NEXT LINE
.Space = %x20
; An ordinary Unicode breaking space
Blank = *Space Break
; A line consisting only of whitespace
U+0020 SPACE
is the only space character in a Kixt Charset Definition.
Spaces and line separators are not interchangeable.
ECMA6Char = %x21-22 / %x25-3F / %x41-5A / %x5F / %x61-7A
; The invariant printing characters from ECMA-6 (also ISO/IEC 646); documents written using only these characters (plus some manner of ASCII linebreak as described above) will be comprehensible in any ECMA-6 character set
; The core Kixt Character Definition syntax (with the exception of linebreaks) only uses characters from this set
ASCIIChar = %x21-7E
; Non-control, non-space ASCII characters
UCSChar =
%xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
; Basic Multilingual Plane
; (minus surrogates and specials)
/ %x10000-1FFFD
; Supplementary Multilingual Plane
/ %x20000-2FFFD
; Supplementary Ideographic Plane
/ %x30000-3FFFD
; Tertiary Ideographic Plane
/ %x40000-4FFFD
; Plane 4
/ %x50000-5FFFD
; Plane 5
/ %x60000-6FFFD
; Plane 6
/ %x70000-7FFFD
; Plane 7
/ %x80000-8FFFD
; Plane 8
/ %x90000-9FFFD
; Plane 9
/ %xA0000-AFFFD
; Plane 10
/ %xB0000-BFFFD
; Plane 11
/ %xC0000-CFFFD
; Plane 12
/ %xD0000-DFFFD
; Plane 13
/ %xE1000-EFFFD
; Supplementary Special-purpose Plane
; Note the first 1000 characters are excluded
; Non-special, non-privateuse, non-ASCII characters
; This is the same definition as used for IRIs
PrivateUse = %xE000-F8FF / %xF0000-FFFFD / %x100000-10FFFD
; Unicode private-use characters
Specials, such as controls or noncharacters, are not allowed in a Kixt Charset Definition.
(This does not prevent you from using their codepoints in the <UnicodeMapping>
of <CharacterDefinition>
s.)
NoSlash = %x20-2E %x30-7E / UCSChar / PrivateUse
NoSpace = ASCIIChar / UCSChar / PrivateUse
NoBreak = NoSpace / Space
AnyChar = NoBreak / Break
You may, generally speaking, use non‐A·S·C·I·I and private‐use characters in a Kixt Charset Definition, except in restricted productions like <Name>
.
Zero = %x30
One = %x31
Bit = Zero / One
NonZero = %x31-39
Decimal = %x30-39
Decimal numbers use the standard A·S·C·I·I digits.
NonZeroHex = %x31-39 / %x41-46
UpperHex = %x30-39 / %x41-46
Hex = %x30-39 / %x41-46 / %x61-66
; Case-insensitive hexadecimal used by IRI syntax
Hexadecimal numbers (<UpperHex>
) use only the digits 0
–9
and the uppercase A·S·C·I·I letters A
–F
.
The <Hex>
production, which also allows lowercase a
–f
, is only used in <IRI>
s.
UpperAlpha = %x41-5A
Alpha = %x41-5A / %x61-7A
A number of productions in Kixt Charset Definitions use only the capital A·S·C·I·I alphabetic characters (<UpperAlpha>
).
<IRI>
s make use of the small A·S·C·I·I alphabetic characters as well (<Alpha>
).
IRI =
URI-scheme %x3A IRI-hier-part
[%x3F IRI-query]
[%x23 IRI-fragment]
IRI-hier-part =
%x2F.2F IRI-authority IRI-path-abempty
/ IRI-path-absolute
/ IRI-path-rootless
/ IRI-path-empty
IRI-authority =
[IRI-userinfo %x40]
IRI-host
[%x3A URI-port]
IRI-userinfo =
*(
IRI-unreserved
/ URI-pct-encoded
/ URI-sub-delims
/ %x3A
)
IRI-host =
URI-IP-literal
/ URI-IPv4address
/ IRI-reg-name
IRI-reg-name =
*(
IRI-unreserved
/ URI-pct-encoded
/ URI-sub-delims
)
IRI-path-abempty = *(%x2F IRI-segment)
IRI-path-absolute =
%x2F [
IRI-segment-nz
*(%x2F IRI-segment)
]
IRI-path-rootless = IRI-segment-nz *(%x2F IRI-segment)
IRI-path-empty = 0IRI-pchar
IRI-segment = *IRI-pchar
IRI-segment-nz = 1*IRI-pchar
IRI-pchar =
IRI-unreserved
/ URI-pct-encoded
/ URI-sub-delims
/ %x3A-40
IRI-query = *(IRI-pchar / Privateuse / %x2F / %x3F)
IRI-fragment = *(IRI-pchar / %x2F / %x3F)
IRI-unreserved =
Alpha / Decimal / %x2D-2E / %x5F / %x7E
/ UCSChar
URI-scheme = Alpha *(Alpha / Decimal / %x2B / %x2D-2E)
URI-port = *Decimal
URI-IP-literal =
%x5B (
URI-IPv6address
/ URI-IPvFuture
) %x5D
URI-IPvFuture =
(%x56 / %x76) 1*Hex %x2E
1*(URI-unreserved / URI-sub-delims / %x5B)
URI-IPv6address =
6(URI-h16 %x3A) URI-ls32
/ %x3A.3A
5(URI-h16 %x3A) URI-ls32
/ [URI-h16] %x3A.3A
4(URI-h16 %x3A) URI-ls32
/ [*1(URI-h16 %x3A) URI-h16] %x3A.3A
3(URI-h16 %x3A) URI-ls32
/ [*2(URI-h16 %x3A) URI-h16] %x3A.3A
2(URI-h16 %x3A) URI-ls32
/ [*3(URI-h16 %x3A) URI-h16] %x3A.3A
URI-h16 %x3A URI-ls32
/ [*4(URI-h16 %x3A) URI-h16] %x3A.3A
URI-ls32
/ [*5(URI-h16 %x3A) URI-h16] %x3A.3A
URI-h16
/ [*6(URI-h16 %x3A) URI-h16] %x3A.3A
URI-h16 = 1*4Hex
; 16 bits of address represented in hexadecimal
URI-ls32 = (URI-h16 %x3A URI-h16) / IPv4address
; Least significant 32 bits of address
URI-IPv4address =
URI-dec-octet
%x2E URI-dec-octet
%x2E URI-dec-octet
%x2E URI-dec-octet
URI-dec-octet =
%x32 %x35 %x30-35
; 250--255
/ %x32 %x30-34 Decimal
; 200--249
/ One 2Decimal
; 100--199
/ NonZero Decimal
; 10--99
/ Decimal
; 0--9
URI-pct-encoded = %x25 Hex Hex
URI-unreserved = Alpha / Decimal / %x2D-2E / %x5F / %x7E
URI-sub-delims = %x21 / %x24 / %x26-2C / %x3B / %x3D
I·R·Is must be <IRI>
s as defined by R·F·C 3987, from which the above productions were taken.
NonEmptyString = NoSpace *([Space] NoSpace)
String = [NonEmptyString]
; May be empty
CommentString = [NoSpace [*NoBreak NoSpace]]
; May be empty
Ordinary strings cannot begin or end with spaces, or have multiple spaces appear in sequence. Comment strings allow multiple spaces in sequence. Neither allows linebreaks.
NameChar = UpperAlpha / Decimal / %x20 / %x2D
; Valid characters for use in names
NameSpaceSequence =
Space (
UpperAlpha
; A letter
/ (%x2D (UpperAlpha / Decimal))
; A hyphen followed by a letter or digit
)
; Rules for including a space in a name
NameHyphenSequence =
%x2D (
UpperAlpha
; A letter
/ Decimal
; A digit
/ NameSpaceSequence
; A space (and then some) as defined above
)
; Rules for including a hyphen
Name =
UpperAlpha *(
UpperAlpha
/ Decimal
/ NameSpaceSequence
/ NameHyphenSequence
)
; Valid character, block, and script names
Names must start with an uppercase A·S·C·I·I letter, and may consist of uppercase A·S·C·I·I letters, A·S·C·I·I digits, U+002D HYPHEN-MINUS
, or spaces.
Hyphens and spaces must not end a name or appear in sequence.
Hyphens must not be surrounded by spaces, and digits must not be preceded by a space.
Codepoint =
*Zero NonZeroHex *3UpperHex
; Any nonzero value
/ 1*Zero
; Zero
UnicodeCodepoint =
%x55 %x2B
(
*Zero (One Zero / NonZeroHex) *4UpperHex
; Any nonzero value
/ 1*Zero
; Zero
)
BinaryCodepoint =
*(Zero [Space]) One *15([Space] Bit)
/ Zero *([Space] Zero)
Integer =
Zero
/ NonZeroHex *3UpperHex
The rules defined above are designed to facilitate first‐match‐wins, greedy matching.
Ordinary codepoints must be hexadecimal numbers in the range 0000
–FFFF
but may be preceded by any number of zeroes.
Unicode codepoints must be preceded with the string U+
and may be padded with any number of zeroes.
Only codepoints in the range U+0000..U+10FFFF
are valid Unicode codepoints.
Binary codepoints may have single spaces between their digits.
Integer values are expressed as hexadecimal numbers from 0
–FFFF
, with no leading zeroes.
SingleLineComment =
*Space %x2F
*Space CommentString
*Space Break
InnerCommentLine =
(
NoSlash
/ %x2F NoSlash
/ %x2F.2F NoSlash
/ %x2F.2F.2F NoBreak
) *NoBreak Break
MultiLineComment =
%x2E.2E.2E Break
*InnerCommentLine
%x2F.2F.2F Break
Comment =
SingleLineComment
/ MultiLineComment
A single·line comment is a single line beginning with U+002F SOLIDUS
and then followed by any number of other characters.
Multiline comments begin with three U+002E FULL STOP
characters and end with three U+002F SOLIDUS
characters.
Comments should be ignored during processing.
Note that multiline comments can only appear on the “top level” and not inside of character declarations or other productions.
The following productions are used in multiple types of declaration, with similar meanings.
Alias =
*Space %x3D
*Space Name
*Space Break
*SingleLineComment
Aliases = 1*Alias
An <Aliases>
gives alternate <Name>
s by which a kixt:Block
or kixt:Character
might be known.
It consists of one or more lines, each beginning with an U+003D EQUALS SIGN
, and followed by a <Name>
.
Upon reaching an <Aliases>
, for each <Name>
, create a new R·D·F triple with current parent as its subject, kixt:alias
as its predicate, and the value of the <Name>
as its object, as an xsd:string
.
OtherName =
*Space %x2D
*Space NonEmptyString
*Space Break
*SingleLineComment
OtherNames = 1*OtherName
An <OtherNames>
gives alternate names for a kixt:Charset
, kixt:Block
, kixt:Script
, or kixt:Character
, which may be more freeform than the <Name>
production allows.
It consists of one or more lines, each beginning with an U+002D HYPHEN-MINUS
, and followed by a <NonEmptyString>
.
Upon reaching an <OtherNames>
, for each <NonEmptyString>
, create a new R·D·F triple with current parent as its subject, kixt:alsoKnownAs
as its predicate, and the value of the <NonEmptyString>
as its object, as an xsd:string
.
Note =
*Space %x2A
*Space NonEmptyString
*Space Break
*SingleLineComment
Notes = 1*Note
A <Notes>
gives a freeform space for adding informative notes to a kixt:Charset
, kixt:Block
, kixt:Script
, or kixt:Character
.
It consists of one or more lines, each beginning with an U+002A ASTERISK
, and followed by a <NonEmptyString>
.
Upon reaching a <Notes>
, for each <NonEmptyString>
, create a new R·D·F triple with current parent as its subject, kixt:note
as its predicate, and the value of the <NonEmptyString>
as its object, as an xsd:string
.
CharsetDeclaration =
CharsetIdentifier
CharsetProperties
[OtherNames]
[Notes]
A <CharsetDeclaration>
defines the Kixt Charset Definition's kixt:Charset
.
It must be the first thing in a Kixt Charset Definition, after an optional U+FEFF Byte Order Mark
, with no leading spaces or breaks.
CharsetIdentifier =
%x3B.43.48.41.52.53.45.54.3C
; `;CHARSET<`
IRI %x3E [
Integer
[%x2E Integer]
] *Space Break
*SingleLineComment
A <CharsetIdentifier>
defines the I·R·I and version for a Kixt Charset Definition's kixt:Charset
.
Upon reaching a <CharsetIdentifier>
, set the current charset to the IRI specified by <IRI>
.
Set current parent to current charset.
Create an R·D·F triple with the current charset as its subject, rdf:type
as its predicate, and kixt:Charset
as its object.
If a first <Integer>
is present, create an R·D·F triple with the current charset as its subject, kixt:version
as its predicate, and the value of the first <Integer>
as its object, as an xsd:integer
.
If a second <Integer>
is present, create an RDF triple with the current charset as its subject, kixt:revision
as its predicate, and the value of the second <Integer>
as its object, as an xsd:integer
.
Finally, set current script to i18n:zzzz
; this is the default script.
Variable = %x56.41.52.49.41.42.4C.45
; `VARIABLE`
CharsetProperties =
[
*Space %x26
*Space Variable
*Space Break
*SingleLineComment
]
A <CharsetProperties>
defines additional properties on a kixt:Character
.
At the moment, the only additional property defined is a promise as to whether the character set is variable‐width‐compatible.
This property must be present if the production is nonempty.
<CharsetProperties>
begins with an U+0026 AMPERSAND
.
Upon reaching a <CharsetProperties>
:
kixt:supportsVariableEncoding
as its predicate, and an object of true
, as an xsd:boolean
, if <Variable>
is present, and false
, as an xsd:boolean
, otherwise.BlockDeclaration =
BlockName
[Aliases]
[OtherNames]
[Notes]
A <BlockDeclaration>
defines a new kixt:Block
.
BlockName =
*Space %x25
*Space Name
*Space Break
*SingleLineComment
A <BlockName>
names a kixt:Block
.
It begins with a U+0025 PERCENT SIGN
, which is followed by the block name.
The special name NO BLOCK
signifies no block.
A <BlockDeclaration>
with a <Name>
of NO BLOCK
must not have a <Aliases>
, <OtherNames>
, or <Notes>
.
Upon reaching a <BlockName>
, if the value of <Name>
is NO BLOCK
, set in a block to false.
Otherwise, set in a block to true, set current block to a new blank node, and set current parent to current block.
If in a block is true, create a new R·D·F triple with current block as its subject, kixt:name
as its predicate, and the value of <Name>
as its object, as a xsd:string
.
ScriptDeclaration =
ScriptIdentifier
[OtherNames]
[Notes]
A <ScriptDeclaration>
defines a new kixt:Script
.
ScriptIdentifier =
*Space %x27
*Space %x3C IRI %x3E
*Space Break
*SingleLineComment
A <ScriptIdentifier>
sets the I·R·I for the current kixt:Script
.
It begins with a U+0027 APOSTROPHE
, which is followed by the script <IRI>
.
Three special scripts are defined:
i18n:zyyy
, for characters which do not belong to a single scripti18n:zinh
, for combining characters which inherit their script from some base characteri18n:zzzz
, for characters whose script is unknownUpon reaching a <ScriptIdentifier>
, set current script to <IRI>
and current parent to current script.
CharacterDefinition =
UnicodeMapping
CharacterInfo
LinebreakProperties
CompatibilityMapping
DecompositionMapping
AdditionalProperties
[Aliases]
[OtherNames]
[Notes]
[References]
[Glyphs]
A <CharacterDefinition>
defines a single kixt:Character
.
The <UnicodeMapping>
and <CharacterInfo>
productions are required; the <CompatibilityMapping>
, <DecompositionMapping>
, and <AdditionalProperties>
productions are required but may be empty; all other productions are optional but must be specified in the order above.
Upon reaching a <CharacterDefinition>
, set current character to a new blank node.
Set current parent to current character.
Create a new R·D·F triple with current charset as its subject, kixt:character
as its predicate, and current character as its object.
If in a block is true, create a new R·D·F triple with current character as its subject, kixt:block
as its predicate, and current block as its object.
Create a new R·D·F triple with current character as its subject, kixt:script
as its predicate, and current script as its object.
Unicode =
*Space UnicodeCodepoint
[1*Space CommentString]
*Space Break
*SingleLineComment
UnicodeMapping = 1*Unicode
A <UnicodeMapping>
defines a the Unicode scalar values to which a given kixt:Character
maps.
These are given as <UnicodeCodepoint>
s, one per line, each optionally followed by a comment.
The Kixt Charset Definition format requires every <CharacterDefinition>
to have a <UnicodeMapping>
.
The character U+FFFD REPLACEMENT CHARACTER
can be used in situations where no mapping is desired.
However, the use of private‐use mappings is generally preferable.
Upon reaching a <UnicodeMapping>
, set current sequence to a new blank node.
Create a new R·D·F triple with current sequence as its subject, olo:length
as its predicate, and total number of <UnicodeCodepoint>
s within the <UnicodeMapping>
as its object, as an xsd:integer
.
For each <UnicodeCodepoint>
:
Set current slot to a new blank node.
Create a new R·D·F triple with current slot as its subject, olo:index
as its predicate, and the one-based index of the <UnicodeCodepoint>
within the <UnicodeMapping>
as its object, as an xsd:integer
.
Set current item to a new blank node.
Create a new R·D·F triple with current item as its subject, rdf:value
as its predicate, and the value of the <UnicodeCodepoint>
as its object, as an xsd:integer
.
Create a new R·D·F triple with current slot as its subject, olo:item
as its predicate, and current item as its object.
Create a new R·D·F triple with current sequence as its subject, olo:slot
as its predicate, and current slot as its object.
Finally, create a new R·D·F triple with current character as its subject, kixt:unicode
as its predicate, and current sequence as its object.
In Turtle, the resulting R·D·F graph produced by the above steps will look something like the following:
[ ] kixt:unicode [
olo:length 1;
olo:slot [
olo:index 1 ;
olo:item [ rdf:value 65533 ] ] ]
BasicType =
%x43.4F.4E.54.52.4F.4C
; `CONTROL`
/ %x4D.45.53.53.41.47.49.4E.47
; `MESSAGING`
/ %x46.4F.52.4D.41.54
; `FORMAT`
/ %x44.41.54.41
; `DATA`
/ %x4E.4F.4E.53.50.41.43.49.4E.47
; `NONSPACING`
/ %x53.50.41.43.49.4E.47
; `SPACING`
/ %x50.52.49.56.41.54.45.55.53.45
; `PRIVATEUSE`
/ %x4E.4F.4E.43.48.41.52.41.43.54.45.52
; `NONCHARACTER`
; It is not possible to define an `UNASSIGNED` character
CharacterInfo =
*Space %x3B
*Space (BinaryCodepoint *Space %x2F / Codepoint Space)
*Space Name
*Space %x28 BasicType %x29
*Space Break
*SingleLineComment
A <CharacterInfo>
defines the basic aspects of a kixt:Character
.
It begins with a U+003B SEMICOLON
, followed by a codepoint in either hexadecimal or binary, followed by the name and basic type of the character.
Upon reaching a <CharacterInfo>
, perform the following steps:
If there is already some subject node which is an object of the predicate kixt:character
on the subject current charset, for whom the object of the predicate kixt:codepoint
is the value of the <Codepoint>
or <BinaryCodepoint>
(whichever is present), replace current character with the first such node in all R·D·F triples in which current character is a subject or object, and set current character to this new node.
This handles the case where a kixt:Character
with this codepoint has already been created as part of a decomposition mapping.
Otherwise, create a new R·D·F triple with current character as its subject, kixt:codepoint
as its predicate, and the value of either <Codepoint>
or <BinaryCodepoint>
as its object, as an xsd:integer
.
Create a new R·D·F triple with current character as its subject, kixt:name
as its predicate, and the value of <Name>
as its object, as an xsd:string
.
Create a new R·D·F triple with current character as its subject, kixt:basicType
as its predicate, and the value of <BasicType>
, appended to the end of the string https://spec.go.kibi.family/ns/kixt/#
, as its object, as an xsd:anyURI
.
Note that this is a literal with a datatype I·R·I of xsd:anyURI
, not an R·D·F I·R·I.
LinebreakClass =
%x53.45.50.41.52.41.42.4C.45
; `SEPARABLE`
/ %x49.4E.53.45.50.41.52.41.42.4C.45
; `INSEPARABLE`
/ %x42.52.45.41.4B
; `BREAK`
/ %x4A.4F.49.4E
; `JOIN`
LinebreakProperties =
[
*Space %x3A
*Space Name
[*Space %x28 LinebreakClass [1*Space LinebreakClass] %x29]
*Space Break
*SingleLineComment
]
A <LinebreakProperties>
defines the linebreaking behaviour of a kixt:Character
.
It begins with a U+003A COLON
, followed by a <Name>
specifying the linebreak category, and optionally followed by one or two <LinebreakClass>
es, in parentheses, to override the default linebreaking behavior for the character.
Upon reaching a <LinebreakProperties>
, perform the following steps:
Create a new R·D·F triple with current character as its subject, kixt:linebreakCategory
as its predicate, and the value of <Name>
as its object, as an xsd:string
.
Create a new R·D·F triple with current character as its subject, kixt:breakBefore
as its predicate, and the value of either the first <LinebreakClass>
, if one is defined, or JOIN
, in either case appended to the end of the string https://spec.go.kibi.family/ns/kixt/#
, as its object, as an xsd:anyURI
.
Note that this is a literal with a datatype I·R·I of xsd:anyURI
, not an R·D·F I·R·I.
Create a new R·D·F triple with current character as its subject, kixt:breakAfter
as its predicate, and the value of either the last <LinebreakClass>
, if one is defined, or INSEPARABLE
, in either case appended to the end of the string https://spec.go.kibi.family/ns/kixt/#
, as its object, as an xsd:anyURI
.
Note that this is a literal with a datatype I·R·I of xsd:anyURI
, not an R·D·F I·R·I.
If only one <LinebreakClass>
is defined, kixt:breakBefore
and kixt:breakAfter
will have the same value for their objects.
CompatibilityMapping =
[
*Space %x28
[*Space %x3C IRI %x3E]
*Space Codepoint
*(1*Space Codepoint)
*Space Break
*SingleLineComment
]
A <CompatibilityMapping>
defines a compatibility decomposition for a kixt:Character
.
It begins with a U+0028 LEFT PARENTHESIS
, followed by an optional <IRI>
mode, followed by a sequence of <Codepoint>
s giving the mapping.
The entire production may be empty; if so, the character's compatibility decomposition is to itself.
The value kixt:GENERIC
indicates a generic compatibility mode and is the default.
Upon reaching a <CompatibilityMapping>
, set current sequence to a new blank node.
Create a new R·D·F triple with current sequence as its subject, olo:length
as its predicate, and the total number of <Codepoint>
s within the <CompatibilityMapping>
, or 1
if <CompatibilityMapping>
is empty, as its object, as an xsd:integer
.
For each <Codepoint>
, or if the <CompatibilityMapping>
is empty:
Set current slot to a new [blank node].
Create a new R·D·F triple with current slot as its subject, olo:index
as its predicate, and the one-based index of the <Codepoint>
within the <CompatibilityMapping>
, or 1
if <CompatibilityMapping>
is empty, as its object, as an xsd:integer
.
If <CompatibilityMapping>
is empty, set current item to current character.
Otherwise, if there is already some subject node which is an object of the predicate kixt:character
on the subject current charset, for whom the object of the predicate kixt:codepoint
is the value of the current <Codepoint>
, set current item to that node.
Otherwise :—
Set current item to a new blank node.
Create a new R·D·F triple with current charset as its subject, kixt:character
as its predicate, and current item as its object.
Create a new R·D·F triple with current item as its subject, kixt:codepoint
as its predicate, and the value of <Codepoint>
as its object, as an xsd:integer
.
Create a new R·D·F triple with current slot as its subject, olo:item
as its predicate, and current item as its object.
Create a new R·D·F triple with current sequence as its subject, olo:slot
as its predicate, and current slot as its object.
Create a new R·D·F triple with current sequence as its subject, kixt:compatibilityMode
as its predicate, and the value of <IRI>
, or https://spec.go.kibi.family/ns/kixt/#GENERIC
if <IRI>
is not present, as its object, as an I·R·I.
Finally, create a new R·D·F triple with current character as its subject, kixt:compatibility
as its predicate, and current sequence as its object.
In Turtle, the resulting R·D·F graph produced by the above steps will look something like the following:
[ ] kixt:compatibility [
olo:length 1;
kixt:compatibilityMode <example:mode> ;
olo:slot [
olo:index 1 ;
olo:item [ kixt:codepoint: 69 ] ] ]
DecompositionMapping =
[
(
*Space %x3C
*Space Codepoint
*(1*Space Codepoint)
*Space Break
) / (
*Space %x3C.3C
*Space Codepoint
1*(1*Space Codepoint)
*Space Break
)
*SingleLineComment
]
A <DecompositionMapping>
defines a canonical decomposition for a kixt:Character
.
It begins with either one or two U+003C LESS-THAN SIGN
s (indicating whether the decomposed form is preferred), followed by a sequence of <Codepoint>
s giving the mapping.
The entire production may be empty; if so, the character's canonical decomposition is to itself.
A decomposition mapping to a single character is always preferred, so the two‐U+003C LESS-THAN SIGN
form is only permitted when defining a mapping to two or more codepoints.
Upon reaching a <DecompositionMapping>
, set current sequence to a new blank node.
Create a new R·D·F triple with current sequence as its subject, olo:length
as its predicate, and total number of <Codepoint>
s within the <DecompositionMapping>
, or 1
if <DecompositionMapping>
is empty, as its object, as an xsd:integer
.
For each <Codepoint>
, or if the <DecompositionMapping>
is empty:
Set current slot to a new blank node.
Create a new R·D·F triple with current slot as its subject, olo:index
as its predicate, and the one-based index of the <Codepoint>
within the <DecompositionMapping>
, or 1
if <DecompositionMapping>
is empty, as its object, as an xsd:integer
.
If <DecompositionMapping>
is empty, set current item to current character.
If there is already some subject node which is an object of the predicate kixt:character
on the subject current charset, for whom the object of the predicate kixt:codepoint
is the value of the current <Codepoint>
, set current item to that node.
Otherwise :—
Set current item to a new blank node.
Create a new R·D·F triple with current charset as its subject, kixt:character
as its predicate, and current item as its object.
Create a new R·D·F triple with current item as its subject, kixt:codepoint
as its predicate, and the value of <Codepoint>
, converted from hexadecimal, as its object, as an xsd:integer
.
Create a new R·D·F triple with current slot as its subject, olo:item
as its predicate, and current item as its object.
Create a new R·D·F triple with current sequence as its subject, olo:slot
as its predicate, and current slot as its object.
Create a new R·D·F triple with current sequence as its subject, kixt:preferred
as its predicate, and an object of true
, as an xsd:boolean
, if <DecompositionMapping>
contains two U+003C LESS-THAN SIGN
s or only one <Codepoint>
, and false
, as an xsd:boolean
, otherwise.
Finally, create a new R·D·F triple with current character as its subject, kixt:decomposition
as its predicate, and current sequence as its object.
In Turtle, the resulting R·D·F graph produced by the above steps will look something like the following:
[ ] kixt:decomposition [
olo:length 1;
kixt:preferred true ;
olo:slot [
olo:index 1 ;
olo:item [ kixt:codepoint: 105 ] ] ]
Deprecated = %x44.45.50.52.45.43.41.54.45.44
; `DEPRECATED`
CharacterWidth =
%x50.52.4F.50.4F.52.54.49.4F.4E.41.4C
; `PROPORTIONAL`
/ %x46.55.4C.4C.57.49.44.54.48
; `FULLWIDTH`
SegmentationClass =
%x45.58.54.45.4E.44.53
; `EXTENDS`
ConjoiningMode =
%x43.4F.4E.4A.4F.49.4E.53.3C
; `CONJOINS<`
IRI %x3E
Conjoins = ConjoiningMode [Integer]
Combines = SegmentationClass [%x2B Integer]
AdditionalProperties =
[
*Space %x26
*Space (
Deprecated [
1*Space (
CharacterWidth [
1*Space ConjoiningMode
]
/ Conjoins
/ Combines
)
]
/ CharacterWidth [
1*Space ConjoiningMode
]
/ Conjoins
/ Combines
)
*Space Break
*SingleLineComment
]
An <AdditionalProperties>
defines a number of additional properties on a kixt:Character
; in order, these are whether the character is deprecated, whether the character is fullwidth or proportional, whether the character conjoins with previous characters of a similar type, and whether the character is a combining character.
All of these elements are optional, but at least one must be present if the production is nonempty as a whole.
<AdditionalProperties>
begins with an U+0026 AMPERSAND
.
Upon reaching an <AdditionalProperties>
:
Create a new R·D·F triple with current character as its subject, kixt:deprecated
as its predicate, and an object of true
, as an xsd:boolean
, if <DecompositionMapping>
contains <Deprecated>
, and false
, as an xsd:boolean
, otherwise.
Create a new R·D·F triple with current character as its subject, kixt:fullwidth
as its predicate, and an object of YES
, as an xsd:string
, if <CharacterWidth>
is FULLWIDTH
; NO
, as an xsd:string
, if <CharacterWidth>
is PROPORTIONAL
; and an empty xsd:string
if <CharacterWidth>
is not present.
Create a new R·D·F triple with current character as its subject, kixt:extends
as its predicate, and an object of true
, as an xsd:boolean
, if <Combines>
is present, and false
, as an xsd:boolean
, otherwise.
Create a new R·D·F triple with current character as its subject, kixt:combiningClass
as its predicate, and the value of <Integer>
in <Combines>
, if present, or 0
, otherwise, as its object, as an xsd:integer
.
If <ConjoiningMode>
is present (including within a <Conjoins>
), create a new R·D·F triple with current character as its subject, kixt:conjoiningMode
as its predicate, and the value <ConjoiningMode>
as its object, as an I·R·I.
Create a new R·D·F triple with current character as its subject, kixt:conjoiningClass
as its predicate, and the value of <Integer>
in <Conjoins>
, if present, or 0
, otherwise, as its object, as an xsd:integer
.
Reference =
*Space %x3E
*Space Codepoint
[1*Space CommentString]
*Space Break
*SingleLineComment
References = 1*Reference
A <References>
allows a kixt:Character
to be associated with other kixt:Character
s for the purposes of comparison.
It consists of one or more lines, each beginning with an U+003E GREATER-THAN SIGN
, and followed by a <Codepoint>
and optional comment.
Upon reaching a <References>
, for each <Codepoint>
:
If there is already some subject node, which is an object of the predicate kixt:character
on the subject current charset, for whom the object of the predicate kixt:codepoint
is the value of the current <Codepoint>
, set current item to that node.
Otherwise :—
Set current item to a new blank node.
Create a new R·D·F triple with current charset as its subject, kixt:character
as its predicate, and current item as its object.
Create a new R·D·F triple with current item as its subject, kixt:codepoint
as its predicate, and the value of <Codepoint>
, converted from hexadecimal, as its object, as an xsd:integer
.
Create a new R·D·F triple with current character as its subject, kixt:compare
as its predicate, and current item as its object.
HexGlyph = 8*UpperHex
Glyph =
*Space %x29
*Space HexGlyph
*Space Break
*SingleLineComment
Glyphs = 1*Glyph
A <Glyphs>
provides a lofi black & white representative glyphs for a kixt:Character
.
It consists of one or <Glyph>
s, each beginning with a U+0029 RIGHT PARENTHESIS
and consisting of one or more lines of binary data represented in hexadecimal.
Each bit of this data represents a pixel, with 0
indicating the background colour and 1
the foreground, beginning from the starting (in both horizontal and vertical directions, and travelling in the direction of the writing mode) corner of the glyph.
Upon reaching a <Glyphs>
, for each <Glyph>
, create a new R·D·F triple with current character as its subject, kixt:representativeGlyph
as its predicate, and the value of <HexGlyph>
as its object, as a xsd:hexBinary
padded with additional terminal zeroes as necessary until the total length is even, and the total length times either four or eight is square.
This effectively limits representative glyphs to having a height divisible by 4.
A Kixt Charset Definition is welformed if it matches the A·B·N·F syntax for <CharsetDefinition>
defined by this specification.
Processors of Kixt Charset Definitions must fail to process any Kixt Charset Definition which is not well-formed.
In addition to the constraints made by the A·B·N·F syntax, the following situations are all semantically invalid in a Kixt Charset Definition:
Two or more <BlockDeclaration>
s with identical <Name>
s, when both are not NO BLOCK
.
An <Aliases>
, <OtherNames>
, or <Notes>
in a <BlockDeclaration>
whose <Name>
is NO BLOCK
.
A <LinebreakProperties>
in a <CharacterDefinition>
which does not have a <BasicType>
of SPACING
.
A <Combines>
, <Conjoins>
, or <CharacterWidth>
in a <CharacterDefinition>
which does not have a <BasicType>
of SPACING
or NONSPACING
.
Assigning an object other than 0
for the kixt:combiningClass
predicate for a subject whose kixt:extends
predicate is not true
or whose kixt:basicType
predicate is not kixt:NONSPACING
.
Assigning an object other than https://spec.go.kibi.family/ns/kixt/#GENERIC
for the kixt:compatibilityMode
predicate for a subject whose kixt:compatibility
predicate has an object with one kixt:slot
predicate whose object has one kixt:item
predicate whose object is the subject itself.
In other words, if a character has a compatibility decomposition of itself, then it must have the default compatibility mode of kixt:GENERIC
.
Assigning the same value as the object of a kixt:name
or kixt:alias
predicate for two different subjects of the same rdf:type
(kixt:name
and kixt:alias
must be unique within a shared namespace).
Assigning i18n:zinh
as the object of a kixt:script
predicate while processing a <CharacterDefinition>
which does not contain a <Combines>
.
Assigning the same object for a kixt:codepoint
predicate while processing two different <CharacterInfo>
s.
Assigning the multiple objects with the same length for a kixt:representativeGlyph
predicate for a single subject.
Finishing processing the Kixt Charset Definition when not every kixt:character
predicate with a subject of current charset has an object for which a kixt:basicType
predicate has been assigned.
Another way of expressing this constraint is that every <Codepoint>
in a <CompatibilityMapping>
, <DecompositionMapping>
, or <Reference>
must identify a kixt:Character
defined in the same document.
Creating a kixt:Charset
which is not variable‐width‐compatible but for which kixt:variable
is true
.
A Kixt Charset Definition is valid if it is not invalid.
The processing behaviours for an invalid Kixt Charset Definition are undefined.
The following predicates are compatibility properties:
kixt:unicode
kixt:basicType
kixt:extends
kixt:combiningClass
kixt:conjoiningMode
(if defined on a character)kixt:conjoiningClass
A Kixt Charset Definition is U·T·F‐8‐compatible if it is valid and does not assign any of the following codepoints:
D800
–DFFF
FEFF
FFFE
–FFFF
A Kixt Charset Definition is null‐compatible if it is valid and the objects of the compatibility properties are equal to those defined in the following Kixt Charset Definition for all characters so defined:
;CHARSET<https://spec.go.kibi.family/-/kixt-charset/null>1.0
% ASCII CONTROLS AND BASIC LATIN
' <https://www.w3.org/ns/i18n#zyyy>
U+0000
; 00 NULL (FORMAT)
= NUL
* This is a meaningless format character which can be used for byte-padding when encoding texts
A Kixt Charset Definition is A·S·C·I·I‐compatible if it is valid and the objects of the compatibility properties are equal to those defined in https://spec.go.kibi.family/-/kixt-charset/ascii for all characters so defined.
All A·S·C·I·I‐compatible charsets are null compatible.
Added <LinebreakProperties>
.
Removed DIVIDER
as a <SegmentationClass>
and replaced kixt:segments
with kixt:extends
.
Required characters with a nonzero kixt:combiningClass
to be nonspacing.
New U·R·L’s and minor revisions.
kixt:TRANSMISSION
has been replaced by kixt:MESSAGING
as a kixt:basicType
.
Allowed the specification of other names and notes on charsets, blocks, and scripts, and aliases on blocks.
Added a variable‐width promise (kixt:supportsVariableEncoding
) to charset declarations.
The syntaxes for <Integer>
and <InnerCommentLine>
were improved.
Redefined a number of syntax components to make it possible to write a Kixt Charset Definition in any ECMA‐6‐compatible character set. By extension, you can now write a Kixt Charset Definition in any X·M·L‐compatible character set. (This is a breaking change.)
Allowed U·T·F‐16–encoded documents with the addition of a B·O·M.
Required processors to ignore U+0000 NULL
characters which appear in Kixt Charset Definition documents.
Removed compatibility definitions that are better‐served in other specifications.
Added <MultiLineComment>
s.
Allowed <SingleLineComment>
s to appear inside of <CharacterDefinition>
s.
Initial specification.