IDL 4.2 RTF Avatar
  1. OMG Issue

IDL42 — IDL Lacks Support for 8-bit Signed/Unsigned Integers

  • Key: IDL42-2
  • Status: closed  
  • Source: Real-Time Innovations ( Dr. Gerardo Pardo-Castellote, Ph.D.)
  • Summary:

    As of version 4.1, IDL lacks explicit support for 8-bit signed/unsigned integers. While, there are two data type with 1-byte size, these are not suitable for encoding 8-bit integers:

    • char => 8-bit quantity that:
      • encodes a single-byte character from any byte-oriented code set, or
      • in an array, encodes a multi-byte character from a multi-byte code set
    • octet => opaque 8-bit. guaranteed not to undergo any change by the middleware.

    Unfortunately, reference programming languages don't have consistent behavior:

    • Java: byte => signed 8-bit
    • C#
      • byte => unsigned 8-bit
      • sbyte => signed 8 bit
    • C (C90 and later)
    • C++
      • char => recommended for chars only. Allowed for numbers but unspecified sign
      • signed char => signed 8-bit number
      • unsigned char => unsigned 8-bit number

    Other dialects of IDL provide support for 8-bit signed/unsigned integers as follows:

    • MIDL (Microsoft's IDL)
      • Introduces the small keyword to represent 8-bit integers (see MIDL documentation). Additionally, it uses the hyper keyword to represent 64-bit integers (i.e., a long long in OMG IDL).
      • The integer types in MIDL are:
        • [unsigned] small (8-bit integer)
        • [unsigned] short (16-bit integer)
        • [unsigned] long (32-bit integer)
        • [unsigned] hyper (64-bit integer)
      • This IDL dialect introduces also the signed keyword and therefore numeric values can be marked as signed or unsigned (if unspecified they default to signed).
      • Web IDL
        • Introduces byte keyword to represent a signed 8-bit integer and treats octets as an unsigned 8-bit integer (see documentation).
        • The integer types in Web IDL are:
          • byte (signed 8-bit integer)
          • octet (unsigned 8-bit integer)
          • [unsigned] short (16-bit integer)
          • [unsigned] long (32-bit integer)
          • [unsigned] long long (64-bit integer)
    • XPIDL
  • Reported: IDL 4.1 — Wed, 26 Jul 2017 14:28 GMT
  • Disposition: Resolved — IDL 4.2
  • Disposition Summary:

    Add support for 8-bit Integers

    We have identified four possible candidate solutions listed below. See the end for the decision on the adopted solution.

    1. Introduce a new keyword in IDL to indicate an 8-bit integer.

    This would preserve the current semantic and mapping of octet (opaque, do not use it as a number). If we do this it would be like the other integers in IDL where it is signed unless we qualify it with "unsigned."

    This solution has three alternatives depending on the keyword selected. We have considered byte, small, and tinyint:

    Alternative 1.1: Add new byte keyword

    byte  ==> signed 8-bit integer
    unsigned byte  ==> unsigned 8-bit number
    

    This approach is clean and consistent but we are concerned with using the keyword "byte" because of the ambiguity with the definitions in C# and Java. It also does not sound like a number. It sounds like "octet". In fact it means exactly that.

    Alternative 1.2: Add new small keyword (following MIDL's model)

    small  ==> signed 8-bit integer
    unsigned small  ==> unsigned 8-bit number
    

    Alternative 1.3: Add new tinyint keyword (following SQL model)

    tinyint  ==> signed 8-bit integer
    unsigned tinyint  ==> unsigned 8-bit number
    

    *Alternative 1.4: Add new int8 and

    {uint8}

    keywords (following new C/C++ standard names)*

    int8    ==> signed 8-bit integer
    uint8  ==> unsigned 8-bit number
    

    Solution 2: Use annotations on "octet" to indicate use as integer

    This would mean that we could have "octet" type used three different ways:

    octet                        -- current meaning (opaque 8-bit) not a number
    @signed octet       -- signed 8-bit number
    @unsigned octet  -- unsigned 8-bit number
    

    This approach seems less clean than Approach 1 but it has the advantage of being less ambiguous and backwards compatible. Having said that, having both a keyword and an annotation called unsigned is a bit ambiguous. Also this would be the only use of the @unsigned annotation which is confusing. For example, someone may be tempted to use @unsigned short as a type.

    Solution 3: Leverage the existing @bit_bound annotation

    IDL4 says this about the @bit_bound annotation:

    "it may be used to force a size, smaller than the default one to members of an enumeration or to integer elements."

    So it seems if we used:

    struct MyStruct {
         @bit_bound(8)  short             member1;
         @bit_bound(8)  unsigned short    member2;
    };
    

    The @bit_bound(8) would affect the serialization and also the language mapping.

    For example in Java member1 could be typed as byte whereas member2 would be short in order to accommodate values between 128 and 255.

    The advantage of Solution 3 is that it reuses the concepts already present without new keywords and annotations in a manner consistent with the intended purpose.

    There are two alternative ways to do this:

    Alternative 3.1
    Just explain in the IDL 4 spec that this is the pattern used to model 8-bit integer types.

    Alternative 3.2 (providing typedef for all integer types)

    Expand the IDL 4 to define typedefs for all integer types making the size of the representation more obvious (similar to the new standard integer types in C99 and C++).

    typedef @bit_bound(8) short              int8;
    typedef @bit_bound(8) unsigned short     uint8;
    typedef               short              int16; 
    typedef               unsigned short     uint16;
    typedef               long               int32;
    typedef               unsigned long      uint32;
    typedef               long long          int64;
    typedef               unsigned long long uint64;
    

    Solution 4: Allow the existing keyword "unsigned" to be used with octet

    There are two alternative ways to implement this.

    Alternative 4.1 Add also a signed keyword

    Add signed as a keyword, which overcomes the ambiguity of solution (2). With the signed keyword, we could specify whether an octet (or a char) is signed or unsigned and follow C/C++ convention.

    octet                   -- current meaning (opaque 8-bit) not a number
    signed octet       -- signed 8-bit number
    unsigned octet   -- unsigned 8-bit number
    

    Alternative 4.2 Redefine “octet” to mean signed int

    With this approach we would have:

    octet opaque 8-bit. guaranteed not to undergo any change by the middleware.
    Able to hold signed integer values within the range -128 to 127 In the language it is mapped to type that can handle signed integer values within the range -128 to 127
    unsigned octet As octet also opaque 8-bit. guaranteed not to undergo any.
    Able to hold unsigned integer values within the range 0 to 255

    Thus in C we would map IDL octet to C90 "signed char" and IDL "unsigned octet" to C90 unsigned char.

    This mapping may break API portability with previous mappings, or break application code where the "char" was mapped to an unsigned value (e.g. in ARM processors).

    To workaround it we could provide some way to prevent the mapping to "signed char". Maybe this is some command-line option to rtiddsgen, like "-nosignedchar"

    Adopted solution

    The chosen solution is 1 with alternative 4 (int8, uint8).
    The reason to prefer introducing new keywords rather than relaying on annotations is that using an annotation (e.g. @bit_bound(8) opens the door for that annotation to be used in unexpected context and with un-expected parameters. For example the user could annotate not just a short, but also a long or long long. Or they could use a different value other than 8. All these cases would have to be described and handled resulting in more complexity for the user and tools. The "keyword" approach is simpler, more constrained, and better aligned with what the user expects.

    The reason to choose int8 and uint8 is to align it with the resolution of IDL42-9. So we save having to add another keyword like "small
    or "tiny" which would not be intuitive to the user.

  • Updated: Tue, 19 Dec 2017 20:04 GMT
  • Attachments: