DDS-XTypes 1.3 RTF Avatar
  1. OMG Issue

DDSXTY13 — Algorithm to compute autoid is missing from the specification

  • Key: DDSXTY13-2
  • Status: closed  
  • Source: Real-Time Innovations ( Dr. Gerardo Pardo-Castellote, Ph.D.)
  • Summary:

    The specification does not define how memberIDs are computed in the case where the @autoid (or @hashid) annotations are used. This is needed for interoperability.

    Section 7.3.1.2.1.1 (Member IDs) specifies there are thee ways to set them:

    • Automatically following a progression that starts from the most-recently specified member ID.
    • Using the @id annotation
    • As a "hash" on the member name when @autoid(HASH) is specified
    • as a "hash" on a string-parameter when @hashid("string-parameter") is specified

    The use of @autoid refers to sub clause 8.3.1.2 in [IDL41]). But there also there is no specification of how the hash should be computed. Only that a "hashing" algorithm should be used.

    IDL working group discussed this and the preference was to leave it unspecified in IDL4 and instead put it in XTYPES or whichever specification depends on these IDs.

    A proposal would be to use an MD5 (as this is already used for key hashing).
    This needs to be done in a platform-independent manner, for example hashing a serialized representation of the string using a pre-specified endianness. Also per section 7.2.2.4.4.3 (Member IDs) the value/range must be representable in 28 bits.

  • Reported: DDS-XTypes 1.2 — Fri, 7 Apr 2017 00:44 GMT
  • Disposition: Resolved — DDS-XTypes 1.3
  • Disposition Summary:

    Use the same hash algorithm specified for the TypeObject (Minimal)

    The (normative) IDL for the TypeObject (Minimal Representation) already requires hashing of member names. This is exercised in the (non-normative) example serializations included as part of XTYPES 1.2.

    The algorithm used for this hashing of member names is described as a comment in the dds-xtypes_typeobject.idl

        // First 4 bytes of MD5 of of a member name converted to bytes
        // using UTF-8 encoding and without a 'nul' terminator.
        // Example: the member name "color" has NameHash {0x70, 0xDD, 0xA5, 0xDF}
        typedef octet NameHash[4];
    

    Although this hash algorithm is only specified for the TypeObject and not the MemberId it would make things simpler to use the same algorithm for the @autoid and @hashid. It is also more efficient that doing a CDR serialization because it saves the length and the trailing NUL.

    We still need to specify how to transform the octet[ 4 ] NameHash into a integer memberId with 28-bits of precision so we can all the 4 bits of flags specified in the MemberId format. This basically requires interpreting the NameHash as having a specific endianness.

    Section 7.6.2.3.3 'TypeLookup Types and Endpoints' provides some examples of NameHash that use Big Endian. For example it says that @hashid("getTypes") is a long with value = 0xd35282d1. Note that the md5("getTypes") is the octet sequence d35282d177f1c8e43144c7e65820d7ae. So value = 0xd35282d1 corresponds to the first 4 bytes interpreted as a BigEndian.

    Therefore an Endianess needs to be specified such that an octet[ 4 ] containing the first 4 bytes of the MD5 can be interpreted as an integer. For example if we had chosen Little Endian the NameHash =

    {0x70, 0xDD, 0xA5, 0xDF}

    would be interpreted as the integer: 0xDFA5DD70.

    We still need to zero 4 bits of this to truncate it to the 28 bits of the memberId and allow the addition of the must-understand flag (M_FLAG) and the length code (LC).

    Table 39 in section 7.4.3.3 specifies that the enhanced mutable header is computed as:

    EMHEADER1 =  (M_FLAG<<31) + (LC<<28) + M.id
    

    This means that we need to clear the 4 most significant bits from the member id (bits 29-32).

    The resolution is to explicitly provide the algorithm used to compute the member ID from a NameHash. This algorithm interprets the NameHash as a Big Endian integer. Using C pseudo code the construction of the member_id from a NameHash would be as follows:

        uint32_t  hash_as_int;
        uint32_t  member_id;
    
    #ifdef (MACHINE_IS_BIG_ENDIAN)
        hash_as_int = *((uint32_t *) NameHash);
    #else
        hash_as_int = *((uint32_t *)EndianessSwap4Bytes(NameHash));
    #endif
    
    // Truncate to 28 bits removing the 4 most significant bits
    member_id = hash_as_int & 0x0FFFFFFF;
    

    In the case of member name "color" we would have:

    NameHash = { 0x70, 0xDD, 0xA5, 0xDF } 
    hash_as_int = 0x70DDA5DF
    member_id  = 0x00DDA5DF
    

    Assuming we have M_FLAG = 1, LC = 1, we would have:

    EMHEADER1 = 0x90DDA5DF
    

    EMHEADER1 is then serialized on the wire using Little Endian format (as specified by the TypeObject serialization) so the bytes on the wire would end up being:

    {0xDF, 0xA5, 0xDD, 0x90} 
    


    Alternatively (currently favored by Task Force)

    We could define the transformation of NameHash to MemberId using LittleEndian. This will save the byte swap done at serialization time. An also save the initial on in LittleEndian machines (which are the most common). So this seems a more optimized approach.''

    In this case member name "color" we would have:

    NameHash = { 0x70, 0xDD, 0xA5, 0xDF } 
    hash_as_int = 0xDFA5DD70
    member_id  = 0x0FA5DD70
    

    If we go with this we need to change the examples in section 7.6.2.3.3 'TypeLookup Types and Endpoints'

    NameHash("getTypes") = {d3, 52, 82, d1}
    NameHash("getDependencies") = {31, fb, aa, 35}
    Also change the example to be unsigned long:

    const unsigned long TypeLookup_getTypes_Hash = 0x018252d3; // @hashid("getTypes")
    const unsigned long TypeLookup_getDependencies_Hash = 0x05aafb31; //@hashid("getDependencies"); 
    
  • Updated: Tue, 8 Oct 2019 17:55 GMT