Archicad C++ API
About Archicad add-on development using the C++ API.

uchar_t is confusing

ReignBough
Enthusiast
When I read a uchar_t data type, I thought it was an unsigned char type. But when I look it up on Definisions.hpp, I found out that it is:
typedef unsigned short              uchar_t;        // 2-byte unicode charater (UTF16)
Why?????? This is so confusing.
~ReignBough~
ARCHICAD 26 INT (from AC18)
Windows 11 Pro, AMD Ryzen 7, 3.20GHz, 32.0GB RAM, 64-bit OS
3 REPLIES 3
Ralph Wessel
Mentor
ReignBough wrote:
When I read a uchar_t data type, I thought it was an unsigned char type. But when I look it up on Definisions.hpp, I found out that it is:
typedef unsigned short              uchar_t;        // 2-byte unicode charater (UTF16)
Why?????? This is so confusing.
I'm not sure why this is confusing, but I assume you've not worked with any text encoding other than ASCII? Early text encodings of this type used only 1 byte (8 bits) per character, so the types char or unsigned char equated to a single character. However, this isn't suitable for many languages that have far more than 256 characters.

uchar_t is targeting UTF16, which is a 16 bit encoding and hence the unsigned short type. For more information about text encoding, take a look here
Ralph Wessel BArch
Software Engineer Speckle Systems
ReignBough
Enthusiast
Well, when we are coding and we want to specify that a character is 16-bits, we use/define wchar / WCHAR (wide char, exactly 16-bits, based on wchar_t) or char16 / CHAR16 (at least 16-bits). We use uchar / UCHAR for unsigned char (and char / CHAR for signed char) for exactly 8-bits and char8 / CHAR8 for at least 8-bits.

This is just a thought.
~ReignBough~
ARCHICAD 26 INT (from AC18)
Windows 11 Pro, AMD Ryzen 7, 3.20GHz, 32.0GB RAM, 64-bit OS
Ralph Wessel
Mentor
ReignBough wrote:
Well, when we are coding and we want to specify that a character is 16-bits, we use/define wchar / WCHAR (wide char, exactly 16-bits, based on wchar_t) or char16 / CHAR16 (at least 16-bits). We use uchar / UCHAR for unsigned char (and char / CHAR for signed char) for exactly 8-bits and char8 / CHAR8 for at least 8-bits.
None of these definitions are standards-based, so it's really a matter of semantics. You could also read uchar_t as unicode character type.

It could also be argued that defining wchar as exactly 16 bits is confusing, given that the C++ standard defines wchar_t as the largest width for the locales supported by that implementation. Or defining char16 as "at least 16 bits" when the C++ standard for char16_t is exactly 16 bits.

Horses for courses 😉
Ralph Wessel BArch
Software Engineer Speckle Systems