Legacy Issue Number: 4008
Source: AT&T ( Duncan Grisby)
In a similar vein to Vishy's question about alignment, what should the
endianness of a word-oriented wchar be? This applies both to single
wchars, and the separate code points in a wstring. With the 2.3 spec,
it seemed quite obvious to me that word-oriented wide characters
should have the same endianness as the rest of the stream. After all,
they are no different from any other word-oriented type.
However, with the new 2.4 spec, there is now a bizarre section saying
that if, and only if, the TCS-W is UTF-16, all wchar values are
marshalled big-endian unless there is a byte-order-mark telling you
otherwise. I don't understand the point of this. Section 2.7 of the
Unicode Standard, version 3.0 says [emphasis mine]:
"Data streams that begin with U+FEFF byte order mark are likely to
contain Unicode values. It is recommended that applications sending
or receiving untyped data streams of coded characters use this
signature. _If other signaling methods are used, signatures should
not be employed._"
It seems quite clear to me that a GIOP stream is a typed data stream
which uses its own signalling methods. The Unicode standard therefore
says that a BOM should not be used.
I guess it's too late to clean up the UTF-16 encoding, but what about
other word-oriented code sets? What if the end-points have negotiated
the use of UCS-4? Should that be big-endian unless there's a BOM?
The spec doesn't say. Even worse, what if the negotiated encoding is
something like Big5? That doesn't have byte order marks. Big5
doesn't have a one-to-one Unicode mapping, so it's not sensible to
always translate to UTF-16.
GIOP already has a perfectly good mechanism for sorting out this kind
of issue. Please can wchar be considered on equal footing with all
other types, and use the stream's endianness?
Reported: CORBA 2.4 — Tue, 31 Oct 2000 05:00 GMT
Disposition: Resolved — CORBA 3.0.2
Updated: Fri, 6 Mar 2015 20:58 GMT