CORBA 3.0 NO IDEA Avatar
  1. OMG Issue

CORBA3 — ORBs using BOMs for UTF-16 (closely related to issue 4008)

  • Key: CORBA3-39
  • Legacy Issue Number: 4723
  • Status: closed  
  • Source: International Business Machines ( Richard Sitze)
  • Summary:

    [3, Chapter 15.3.1.6] mentions the use of BOM to indicate (and override the
    OMG byte order indicator flag [3, Chapter 15.2.1]) the endian-ness of the
    UTF-16 encoded wchar or wstring data.

    This is incorrect and goes against the Unicode recommendations [1]?refer to
    the Unicode conformance clause C3 [4, Chapter 3.1], and the discussion
    related to the use of BOM [4, Chapter 2.7].

    [4, Chapter 3.1] unambiguously implies that a BOM is not necessary if a
    higher-level protocol indicates the endian-ness. [4, Chapter 2.7]
    categorically states: "if other signaling methods (the OMG byte order flag
    in this context) are used, signatures (BOM) should not be employed".

    The UTF-16 endian rules of [3, Chapter 15.3.1.6] are clearly influenced by
    [2]. In the MIME world, an initial U+FEFF or U+FFFE is interpreted as BOMs.
    The BOM (or its absence) indicates the endian-ness of UTF-16 encoded data
    in the internet MIME world. But for CORBA messages or CDR encapsulations,
    the OMG byte order flag is already explicitly marking the UTF-16 encoded
    data as UTF-16BE or as UTF-16LE. U+FEFF or U+FFFE should not be used as
    BOMs for UTF-16 encoded data in the CORBA domain.

    Therefore, it is proposed that any U+FEFF or U+FFFE, regardless of their
    positions in the marshalled data, must be interpreted as ZERO WIDTH
    NO-BREAK SPACE characters, and not as BOMs. All the references to BOM in
    [3, Chapter 15.3.1.6] must be removed altogether.

    Adoption of the above Unicode conformant rule will
    – result in more efficient encoding of wchar/wstring data?no need to place
    U+FFFE for little-endian UTF-16/UTF-32 wchars/wstrings,
    – eliminate the ugly situation, where the BOM of an UTF-16/UTF-32 encoded
    wchar/wstring data contained in a message or CDR encapsulation indicate a
    different byte order than that specified by the OMG byte order flag for the
    same message or CDR encapsulation.

  • Reported: CORBA 2.6 — Tue, 4 Dec 2001 05:00 GMT
  • Disposition: Resolved — CORBA 3.0.2
  • Disposition Summary:

    This proposal results in a complete reversal of an earlier adopted resolution, and hence would be in

  • Updated: Fri, 6 Mar 2015 20:58 GMT