CORBA 3, chapter 13.8, defines the Codec interface to encode
arbitrary data values into CORBA::OctetSeq "blobs" and vice
versa. This interface can be used, e.g., to supply and retrieve
ServiceContext data using the PortableInterceptor interfaces.
In practice, the Codec interface is also being used for data
serialization, i.e., to store and retrieve arbitrary values in
files or other databases.
However, the interface is deficient in that it does not consider
all possible variables that are needed for interoperability.
It supports setting the CDR version that is to be used, but
neglects byteorder and codeset settings.
Consequently, the encoded values are platform-specific. If a
value was encoded on a little-endian system, it will not decode,
or worse, decode erroneously, on a big-endian system. The same
caveats apply to codesets, e.g., when an ISO-8859-1 encoded
blob is decoded using UTF-8 or Windows-1252.
To support interoperability, the Codec interface needs to be
extended.
My recommendation is to extend the CodecFactory interface,
so that it supports creating CDR version-, byteorder-, and
codeset-specific Codec instances, either supplying user-
provided values for each, or informing the user about chosen
defaults.
Example:
module IOP {
const EncodingFormat ENCODING_DEFAULT = -1;
typedef short ByteorderFormat;
const ByteorderFormat BYTEORDER_DEFAULT = -1;
const ByteorderFormat BYTEORDER_BIGENDIAN = 0;
const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1;
struct EncodingExt
{
EncodingFormat format;
octet major_version; // set to 0 for default
octet minor_version;
ByteorderFormat byteorder;
CONV_FRAME::CodeSetId char_data; // set to 0 for default
CONV_FRAME::CodeSetId wchar_data; // set to 0 for default
}
;
local interface CodecFactory
{
// create_codec remains as before
Codec create_codec_ext (inout EncodingExt enc)
raises (UnknownEncoding);
}
;
};
The create_codec_ext operation would create an appropriate
Codec instance, if available; it will then set all "default"
members of the EncodingExt structure to their actual values,
so that the application can store this information along
with any encoded values.
One potential criticism of the above is that the encoding
format's parameters depend on the encoding format. For example,
there may be encoding formats that are byteorder-independent,
or that consistently use UTF-32 for strings, thus not needing
codeset parameters. Also, they may use wildly different
versioning. So a "better" solution might involve passing
the EncodingFormat, and an Any with a format-specific data
type.
That could look like:
module GIOP {
typedef short ByteorderFormat;
const ByteorderFormat BYTEORDER_DEFAULT = -1;
const ByteorderFormat BYTEORDER_BIGENDIAN = 0;
const ByteorderFormat BYTEORDER_LITTLEENDIAN = 1;
struct CDREncodingParameters
{
octet major_version; // set to 0 for default
octet minor_version;
ByteorderFormat byteorder;
CONV_FRAME::CodeSetId char_data; // set to 0 for default
CONV_FRAME::CodeSetId wchar_data; // set to 0 for default
}
;
};
module IOP {
const EncodingFormat ENCODING_DEFAULT = -1;
local interface CodecFactory
{
// create_codec remains as before
Codec create_codec_ext (inout EncodingFormat format,
inout Any parameters)
raises (UnknownEncoding);
}
;
};
Once we have consensus on the approach, I will gladly volunteer
to come up with a full set of editing instructions