Legacy Issue Number: 4723
Source: International Business Machines ( Richard Sitze)
[3, Chapter 126.96.36.199] mentions the use of BOM to indicate (and override the
OMG byte order indicator flag [3, Chapter 15.2.1]) the endian-ness of the
UTF-16 encoded wchar or wstring data.
This is incorrect and goes against the Unicode recommendations ?refer to
the Unicode conformance clause C3 [4, Chapter 3.1], and the discussion
related to the use of BOM [4, Chapter 2.7].
[4, Chapter 3.1] unambiguously implies that a BOM is not necessary if a
higher-level protocol indicates the endian-ness. [4, Chapter 2.7]
categorically states: "if other signaling methods (the OMG byte order flag
in this context) are used, signatures (BOM) should not be employed".
The UTF-16 endian rules of [3, Chapter 188.8.131.52] are clearly influenced by
. In the MIME world, an initial U+FEFF or U+FFFE is interpreted as BOMs.
The BOM (or its absence) indicates the endian-ness of UTF-16 encoded data
in the internet MIME world. But for CORBA messages or CDR encapsulations,
the OMG byte order flag is already explicitly marking the UTF-16 encoded
data as UTF-16BE or as UTF-16LE. U+FEFF or U+FFFE should not be used as
BOMs for UTF-16 encoded data in the CORBA domain.
Therefore, it is proposed that any U+FEFF or U+FFFE, regardless of their
positions in the marshalled data, must be interpreted as ZERO WIDTH
NO-BREAK SPACE characters, and not as BOMs. All the references to BOM in
[3, Chapter 184.108.40.206] must be removed altogether.
Adoption of the above Unicode conformant rule will
– result in more efficient encoding of wchar/wstring data?no need to place
U+FFFE for little-endian UTF-16/UTF-32 wchars/wstrings,
– eliminate the ugly situation, where the BOM of an UTF-16/UTF-32 encoded
wchar/wstring data contained in a message or CDR encapsulation indicate a
different byte order than that specified by the OMG byte order flag for the
same message or CDR encapsulation.
Reported: CORBA 2.6 — Tue, 4 Dec 2001 05:00 GMT
Disposition: Resolved — CORBA 3.0.2
This proposal results in a complete reversal of an earlier adopted resolution, and hence would be in
Updated: Fri, 6 Mar 2015 20:58 GMT
CORBA3 — ORBs using BOMs for UTF-16 (closely related to issue 4008)
- Key: CORBA3-39
- OMG Task Force: Core 2002 RTF