XTCE 1.2 RTF Avatar
  1. OMG Issue

XTCE12 — StringDataType UTF-8/CharacterWidth Issue

  • Key: XTCE12-110
  • Legacy Issue Number: 14478
  • Status: closed  
  • Source: NASA ( Mr. James Kevin Rice)
  • Summary:

    Description Kevin Rice 2007-10-22 21:56:02 BST
    UTF-8 and UTF-16 are actually multi-byte formats. The character width field
    just says "8" or "16" somewhat implying bit width per character which doesn't
    really match UTF-8/16. We should match them up.

  • Reported: XTCE 1.1 — Thu, 17 Sep 2009 04:00 GMT
  • Disposition: Resolved — XTCE 1.2
  • Disposition Summary:

    Improve the ability to specify String Data Encoding

    Replace the simpleType StringEncodingType definition with the following definition containing a more comprehensive, annotated list of string encodings.

    <simpleType name="StringEncodingType">
    <annotation>
    <documentation xml:lang="en">Defines string encodings. US-ASCII (7-bit), ISO-8859-1 (8-bit Extended ASCII), Windows-1252 (8-bit Extended ASCII), UTF-8 (Unicode), UTF-16 (Unicode with Byte Order Mark), UTF-16LE (Unicode Little Endian), UTF-16BE (Unicode Big Endian). See StringDataEncodingType.</documentation>
    </annotation>
    <restriction base="string">
    <enumeration value="US-ASCII"/>
    <enumeration value="ISO-8859-1"/>
    <enumeration value="Windows-1252"/>
    <enumeration value="UTF-8"/>
    <enumeration value="UTF-16">
    <annotation>
    <documentation xml:lang="en">With UTF-16, encoded bits must be prepended with a Byte Order Mark. This mark indicates whether the data is encoded in big or little endian.</documentation>
    </annotation>
    </enumeration>
    <enumeration value="UTF-16LE">
    <annotation>
    <documentation xml:lang="en">With UTF-16LE, encoded bits will always be represented as little endian. Bits are not prepended with a Byte Order Mark.</documentation>
    </annotation>
    </enumeration>
    <enumeration value="UTF-16BE">
    <annotation>
    <documentation xml:lang="en">With UTF-16BE, encoded bits will always be represented as big endian. Bits are not prepended with a Byte Order Mark.</documentation>
    </annotation>
    </enumeration>
    <enumeration value="UTF-32">
    <annotation>
    <documentation xml:lang="en">With UTF-32, encoded bits must be prepended with a Byte Order Mark. This mark indicates whether the data is encoded in big or little endian.</documentation>
    </annotation>
    </enumeration>
    <enumeration value="UTF-32LE">
    <annotation>
    <documentation xml:lang="en">With UTF-32LE, encoded bits will always be represented as little endian. Bits are not prepended with a Byte Order Mark.</documentation>
    </annotation>
    </enumeration>
    <enumeration value="UTF-32BE">
    <annotation>
    <documentation xml:lang="en">With UTF-32BE, encoded bits will always be represented as big endian. Bits are not prepended with a Byte Order Mark.</documentation>
    </annotation>
    </enumeration>
    </restriction>
    </simpleType>

    Next, StringDataEncodingType is updated to allow for more flexible definition of string sizes by adding a Variable element as a replacement for SizeInBits, used in XTCE 1.1. The SizeInBits for XTCE 1.1 is now reserved for static length strings parameters.

    from

    <complexType name="StringDataEncodingType">
    <annotation>
    <documentation xml:lang="en">For common encodings of string data</documentation>
    </annotation>
    <complexContent>
    <extension base="xtce:DataEncodingType">
    <sequence>
    <element name="SizeInBits" type="xtce:SizeInBitsType"/>
    </sequence>
    <attribute name="encoding" type="xtce:StringEncodingType" default="UTF-8"/>
    </extension>
    </complexContent>
    </complexType>

    to

    <complexType name="StringDataEncodingType">
    <annotation>
    <documentation xml:lang="en">Describe common encodings of string data: UTF-8 and UTF-16. See StringDataType.</documentation>
    </annotation>
    <complexContent>
    <extension base="xtce:DataEncodingType">
    <choice>
    <element name="SizeInBits" type="xtce:SizeInBitsType">
    <annotation>
    <documentation xml:lang="en">Static length strings do not change in overall length between samples. They may terminate before the end of their buffer using a terminating character, or by various lookups, or calculations. But they have a maximum fixed size, and the data itself is always within that maximum size.</documentation>
    </annotation>
    </element>
    <element name="Variable" type="xtce:VariableStringType">
    <annotation>
    <documentation xml:lang="en">A variable length string may change lengths between samples.</documentation>
    </annotation>
    </element>
    </choice>
    <attribute name="encoding" type="xtce:StringEncodingType" default="UTF-8"/>
    </extension>
    </complexContent>
    </complexType>

    With the change to the StringDataEncodingType, it is necessary to add a new type definition for the Variable element.

    <complexType name="VariableStringType">
    <annotation>
    <documentation xml:lang="en">Describe a variable string whose length may change between samples.</documentation>
    </annotation>
    <choice>
    <element name="LeadingSize" type="xtce:LeadingSizeType"/>
    <element name="DynamicValue" type="xtce:DynamicValueType"/>
    <element name="TerminationChar" type="hexBinary"/>
    <element name="DiscreteLookupList" type="xtce:DiscreteLookupListType"/>
    </choice>
    </complexType>

    To support this SizeInBits versus Variable distinction, a simplification is also made to SizeInBitsType since it is only used within StringDataEncoding. Replace the existing SizeInBitsType with this new one. Note that we have also removed the historic comment about Castor COTS and restored the default value.

    <complexType name="SizeInBitsType">
    <choice>
    <element name="Fixed">
    <complexType>
    <sequence>
    <element name="FixedValue" type="xtce:FixedIntegerValueType"/>
    </sequence>
    </complexType>
    </element>
    <element name="TerminationChar" type="hexBinary" default="00">
    <annotation>
    <documentation xml:lang="en">Like C strings, they are terminated with a special string, usually a null character.</documentation>
    </annotation>
    </element>
    <element name="LeadingSize" type="xtce:LeadingSizeType"/>
    </choice>
    </complexType>

  • Updated: Tue, 10 Jul 2018 14:22 GMT